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Abstract: 

We give an alternative proof that every two-person non-zero-sum absorb- 
ing positive recursive stochastic game with finitely many states has approx- 
imate equilibria, a result proven by Nicolas Vieille. Our proof uses a state 
specific discount factor which is similar to the conventional discount factor 
only when there is only one non-absorbing state. Additionally we show that 
if the players engage in time homogeneous Markovian behavior relative to 
some finite state space of size n then for the existence of an e-equilibrium 
it suffices that one-stage deviation brings no more than an e 3 / (nM) gain to 
a player, where M is a bound on the maximal difference between any two 
payoffs. 
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1 Introduction 



A two-player stochastic game is played in stages. At every stage the game 
is in some state of the world. Both players are informed of the whole his- 
tory, including the current state, and based on this information they choose 
simultaneously a pair of actions. The current state and the pair of actions 
chosen determine both a stage payoff for each of the players and a probability 
distribution according to which a new state is chosen. 

For any e > 0, an e-equilibrium in a game is a set of strategies, one 
for each player, such that no player can gain in payoff by more than e by 
choosing a different strategy, given that all the other players do not change 
their strategies. A game has approximate equilibria if for every positive 
e > it has an e-equilibrium. The value of a zero-sum game, should one 
exist, is the unique cluster point of the e-equilibrium expected payoffs (for 
the first player) as e goes to zero. The un-discounted payoff of a player in a 
stochatic game with infinitely many stages, when defined, is a limit as the 
number of stages goes to infinity of the average summed over the stages of 
the player's expected payoffs. Unless specified, the payoffs of a stochastic 
game are undiscounted. 

Shapley (1953) presented the model of stochastic games, and proved that 
a discounted zero-sum games always have a value obtainable with stationary 
optimal strategies. This result was generalized for equilibria in n-player non- 
zero-sum discounted games by Fink (1964). 

An absorbing state is such that the play never leaves this state once it is 
reached. Kohlberg (1974) proved that every two-player zero-sum stochastic 
game with only one non-absorbing state has a value. Based on the work of 
Bewley and Kohlberg (1976), Mertens and Neyman (1981) generalized this 
result, and proved that every zero-sum stochastic game has a value. 

A stochastic game is recursive if the stage payoff at all non-absorbing 
states is zero, no matter what the players do. A recursive stochastic game is 
positive recursive if there is a player who receives at all absorbing states only 
positive payoffs. A positive recursive stochastic game is absorbing if the player 
who receives these positive payoffs can force the play toward absorption. 

Existence of approximate equilibria in two-player non- zero-sum stochas- 
tic games with only one non- absorbing state was proven by Thuijsman and 
Vrieze (1989). In their proof Thuijsman and Vrieze considered a sequence of 
stationary equilibria of the discounted game as the discount factor tends to 



1 



1, and they constructed different types of e-equilibrium strategies according 
to various properties of the sequence. 

Vieille (2000a) showed that for approximate equilibria to exist in every 
two-player non-zero-sum stochastic game with finitely many states it is suf- 
ficient to prove this for the sub-class of absorbing positive recursive games. 
Furthermore Vieille (2000b, 2000c) proved that indeed all games in this sub- 
class have approximate equilibria. 

In the present paper, we provide an alternative proof of the Vieille result 
for absorbing positive recursive games. The primarily difference between our 
proof and Vieille's lies in the use of a kind of discount factor rather than 
Vieille's undiscounted evaluation. This discount factor is state specific and 
is similar to the conventional discount factor only when there is only one 
non-absorbing state. We were inspired by the Thuijsman and Vrieze article 
and their confidence that their ideas could deliver the same result for finitely 
many states. Our goal was to confirm their optimism by demonstrating the 
great versatility of the discounting concept. 

In positive recursive games, discount factors for the player receiving pos- 
itive absorbing payoffs persuade him to make moves that push the game 
toward absorption. Let us call this player the second player. The serious 
problem with generalizing the Thuijsman and Vrieze approach directly is 
that the usual discounted evaluation does not discriminate between the time 
spent at the state at which a decision is made and the other states that might 
follow this decision. As long as the second player at a given state chooses 
between two moves that do not involve returning to that state, his evaluation 
of those moves in an appropriate discounted game should be based upon his 
undiscounted evaluation. Play that never returns to this state before ab- 
sorption but visits other states arbitrarily many times receives no discount 
whereas play that re- visits the initial state n times receives a (l—5) n discount, 
regardless of its visits to other states. 

We see no way to generalize our proof to three player games (and it 
appears highly unlikely). On the other hand, we can not dismiss the pos- 
sibility; (see also Solan, 1999, where discounted evaluations were used to 
understand some three player undiscounted stochastic games). If the com- 
pactification of a strategy space creates discontinuities in the undiscounted 
payoffs a discounted evaluation may handle the points of discontinuity suc- 
cessfully. A false impression that discounting is useless to understanding the 
undiscounted game may result from a lack of knowledge of how to turn off 
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the discount where one is sufficiently far from the points of discontinuity. As 
we will see below, knowing when to turn off the discount is central to our 
approach. 

The secondary difference between our proof and Vieille's is that the math- 
ematics we use is entirely elementary. No deep theorems of mathematics are 
required; for example, there is no use of the theory of semi-algebraic func- 
tions. What we need from the theory of Markov chains is very elementary and 
proved entirely in this paper. Due to our discounting approach we work with 
taboo probabilities rather than the directed graphs perspective of Freidlin 
and Wentzell, (1984). 

The only theorem we quote instead of proving is Doob's submartingale 
inequality, a generalization of Kolmogorov's inequality and also an easy the- 
orem to prove. Applying the inequality, we show that if the players engage 
in time homogeneous Markovian behavior relative to some finite state space 
of size n then for the existence of an e-equilibrium it suffices that one-stage 
deviation brings no more than an e 3 /nM gain to a player, where M is a 
bound on the maximal difference between any two payoffs. 

Countably many states 

We developed our unorthodox approach to stochastic games with the 
hope that it would deliver approximate equilibrium existence for all two- 
person non-zero-sum stochastic games with countably many states. We have 
failed in this attempt. 

The main problem is that our approach (and that of Vieille) rests ulti- 
mately on the pideon-hole principle. If the expected number of visits to every 
non-absorbing state is finite then with probability one an absorbing state is 
reached. This does not hold if there are infinitely many non-absorbing states. 

In general, what is the difficulty in proving approximate equilibrium ex- 
istence for non- zero-sum two-person stochastic games with countably many 
states? Several important positive results need to be mentioned. Maitra 
and Sudderth (1991) proved that all zero-sum stochastic games with count- 
ably many states have values. In a game of perfect information, the players 
take turns making their moves and each player knows the previous moves 
of the other players; the classic example is that of chess. A Blackwell game 
is identical in transition structure to a stochastic game, but the payoffs are 
determined by a function Borel measurable with respect to the histories of 
play. Martin (1975) proved that all zero-sum Blackwell games of perfect 
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information have values, and Mertens and Neyman (in Mertens 1987) ex- 
tended Martin's result to non-zero-sum games with finitely many players. 
Using his result for games of perfect information, Martin (1998) proved that 
all zero-sum Blackwell games have values. 

The differences between non-zero-sum stochastic games (with simultane- 
ous moves) and either non-zero-sum Blackwell games of perfect information 
or zero-sum Blackwell games with simultaneous moves are formidable. The 
probability of absorption at a stage in a stochastic game can be also a min- 
imal bound on that stage's deviation from pure equilibria; (for example see 
the "Big Match" in Blackwell and Ferguson, 1968). With the e-equilibria 
of many games, including the absorbing positive recursive variety, while ab- 
sorption must become a near certainty the culmulative opportunity to exploit 
deviations must not exceed e. Therefore one needs that stage for stage ap- 
proximate equilibria can translate to cumulative approximate equilibria. In 
zero-sum games this is not so problematic because the gains to one player 
from deviation equal the losses to the other player. But with two-person 
non- zero-sum games, one must consider functions with values in R 2 ; the po- 
tential independence of the two values and need for a cooperative solution 
frustrate attempts to generalize the approachs that were successful with zero- 
sum games. On the other hand if the moves are made simultaneously how 
does one know the other player is adhering to a cooperative agreement? So 
far the main answer has been to request from each player Markovian behav- 
ior, accompanied by statistical testing and punishment by the other player 
in the event of significant statistical deviation. With this approach, it is nec- 
essary that the probability that an honest player will be punished unjustly 
can be made arbitrarily small. As we will demonstrate with the following 
proposition and counter-example to a variation on this proposition, such a 
control process is unlikely in general for Markovian behavior that is carried 
out essentially on a countable state space. 

If S is a finite or countable set let A(S) stand for the space of probability 
distributions on S. A Markov chain is defined by a finite or countable state 
space S and for every s G S and stage i > a probability distribution 
pf G A(S') governing the distribution on the states at the % + 1st stage, 
given that s is the state on the ith stage. It is time homogeneous if p\ is 
indendendent of the i. 

Proposition 4.2: Let X be a finite space. For every x G X let Y x be a 
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finite space, with Y := U x exY x . ( in the context of stochastic games, X will 
be the state space and Y x will be the set of moves that a player has at the 
state x G X.) There are probability transitions (p x G A(Y X ) | x G X) from 
X to Y and there are probability transitions (p y G A(X) | y G Y) from Y 
to X, so that for every starting point x G X a time homogeneous Markov 
chain on X U F is defined. On the even stages % = 0, 2, 4, . . . the process is 
in X and on the odd stages the process is in Y. Let there be an evaluation 
function v : X U Y — > R that is harmonic with respect to the transitions 
(meaning that a martingale is formed). Let M > be a uniform bound for 
the maximal difference between all values of v. For every pair x G X and 
y G Y x such that y is reached from x with positive probability (according to 
p x ) the difference between v(y) and v(x) is no more than 5 > 0. 
Conclusion: If |X| = n, e < 1/2, and 5 < e 3 /Mn then the probability that 
there exists an / with YH = o 2,...( v (yi+i) ~ v ( x i)) > e does not exceed e. 

The complexity of the Y x play no role in the proof of Proposition 4.2, and 
therefore it could have many generalizations corresponding to variations in 
the structure of the Y x . 

To emphasize the importance of the finite number \X\, the following is 
a counter-example to Proposition 4.2 if we assume that the bound for 5 is 
independent of the cardinality of X. Furthermore, if we consider processes 
that are not time-homogeneous, it does not help if for every stage the sum 
over the states of the maximal differences add up to no more than 5. 

Consider a random walk on n + 1 positions such that at the left end (at 
position 0) the player receives an absorbing payoff of and on the right end 
(at position n) an absorbing payoff of 1. The space X is the n + 1 positions 
and for every x G X the two-set Y x consists of the two directions "left" and 
"right" . Given any small 5 > 0, one can make n large enough so that at every 
stage the change in expected payoff does not exceed 5. Now reformulate the 
randon walk so that at the fcth stage of play there is no motion at any i 
position with i ^ k (mod n — 1), but at the k' — k (mod n — 1) position there 
is an equal 1/2 probability of moving either to the position kl — 1 or to k' + 1. 
At each stage the sum over the states of the differences in expected payoffs 
remains no more than 8, and yet we are no closer to satisfying the conclusion 
of the proposition. (With n even and starting in the middle position with 
an expected payoff of 1/2, for every small positive e with probability close 
to 1/2 there will be motion to a position with an expected payoff of at least 
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1/2 + 2e.) 

We expect no proof of approximate equilibrium existence for all non- 
zero-sum stochastic games with countable state spaces without a radically 
different approach. If a proof for countably many states can be found, its 
application to finite state truncations of the countable state game would pro- 
vide approximate equilibria of the finite state games such that the average 
number of stages before absorption would not explode with the increase in 
the finite number of these states. In the proof below for a fixed e there is 
no lower bound determined by the number of states on the rate for which 
an absorbing state is reached. Indeed, because such a proof would imply the 
existence of yet another alternative proof for finitely many states with dra- 
matic absorption rate properties, we suspect that there is a counter-example. 
Furthermore, it is possible that the complexities from countably many states 
involved in a two-player counter-example could be mimicked by the introduc- 
tion of more players in a stochastic game with finitely many states, yielding 
a counter-example to approximate equilibria in this context as well. 

We suspect that approximate equilibrium existence for a broad class of 
two-person stochastic games played on countable state spaces must rest on 
a fundamental assumption: that there is a uniform bound on the number 
of states possible on any given stage of play. With a finite number of such 
positions, it is still not clear how appropriate Markovian should be found. 
Even with only one non-absorbing position, the possible infinite variations, 
including the number of moves for each player and the order in which similar 
"types" may appear, make the problem formidable. At least the generaliza- 
tion of Lemma 4.1 to Markov chains that are not time homogeneous will be 
necessary. Another reason to present our alternative proof of the Vieille re- 
sult is the hope that it will be relevant to this case, which we call the case of 
finitely many positions. If for each non-absorbing position one could find an 
appropriate common identity to an infinite sub-sequence of states occuring 
in that position, then the pideon hole principle could be applied successfully. 
Throughout this paper, we comment on the case of finitely many positions. 

Organization 

To execute our proof efficiently, we will assume that Player One has the 
ability to send signals to Player Two that are independent of the transitions in 
the games. The easiest way to formalize this property is to assume that every 
move of Player One at a non-absorbing state is paired with another move 
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at the same state that is its identical copy with respect to the transitions. 
Without this assumption, the proof is formality more involved, less elegant, 
however essentially equivalent. In the section following the conclusion of the 
main proof, we prove the result without this signaling assumption. 

The argument and the paper are organized as follows. 

Section 2 introduces the model of absorbing positive recursive stochastic 
games and the basic concepts of Markov chains. Additionally we introduce 
an important concept with regard to the movement between states, called 
taboo probabilities. A taboo probability is the probability that one moves 
from an initial state to some set of target states without travelling through 
some second set of "forbidden" states. 

Section 3 gives proofs of all the needed lemmas on Markov chains. The 
most central lemma is Lemma 3.2; it states that when motions at a multitude 
of states are removed whose frequencies are only a small fraction of the total 
motion toward a fixed state then the flow continues toward this fixed state 
with about the same or greater tendency. 

Section 4 contains a proof of Proposition 4.2, which also establishes gen- 
eral sufficient conditions for the existence of approximate equilibria. We 
create new states from our old states, which we call situations; at most three 
situations are created from each original state. The method of creating the 
situations we call polarization, introduced in Section 3. Except for the rare 
possibility of punishment, our behavior strategies will be stationary on the 
situations. Section 4 concludes with Theorem 1, a demonstration of sufficient 
conditions for approximate equilibrium existence in our games. 

In Section 5 we introduce the state specific discounted evaluation for the 
second player. We define the discounted evaluation such that the discounting 
rates are adjusted for states sufficiently close together, according to a metric 
determined by the strategies. We select a quantity e much smaller than 
e, and define the discounted evaluation so that moves with more than an 
e probability of non-return to the state are evaluated in an undiscounted 
way and moves with a 7 probability of no return with 7 < e are evaluated 
as if their probability of no return was 7/e. Our choice for e is guided by 
Proposition 4.2. 

A serious problem with the state specific discounted evaluation is that 
the motivations of the second player at one state can be very different from 
that at another state. Essentially the second player becomes a multitude of 
players, one for each state. This allows for the second player at some states to 
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prefer moves that result in too slow a motion toward absorption and there- 
fore also discounted evaluations below the zero-sum value. To avoid this 
problem, in Section 2 we define a new correspondence, called the "jump" 
correspondence, based upon stationary strategies optimal in the convention- 
ally discounted game. The use of the jump correspondence by the second 
player results in fast absorption. The "best-reply" correspondence of the sec- 
ond player is a combination of the jump correspondence with a maximization 
of the state specific discounted evaluation - when the discounted evaluation 
is too low, the jump correspondence is activated. For the first player, the 
undiscounted evaluation is used to define her "best-reply" correspondence. 
With the "best-reply" correspondences for both player defined, we demon- 
strate two important properties. Lemma 5.4 shows that at a fixed point the 
jump correspondence of the second player has only very limited influence on 
the play. Lemma 5.5 contains the key argument to our entire approach; it is 
used repeatedly to solve the most difficult problems. It shows that if there 
is a meaningful discrepancy between the discounted and undiscounted eval- 
uations for the second player then the second player seeks primarily motion 
with the fastest absorption rate. 

The synthesis of the previous sections lies in Section 6. Theorem 2 proves 
that the conditions of Theorem 1 are always satisfied - implying the exis- 
tence of approximate equilibria. Here we consider sets such that a significant 
proportion of all the motion leaving these sets are from Player Two moves 
with payoffs for Player Two significantly below the set-average payoff. Fix- 
ing any such state in a set where such moves take place, we look at what 
happens when Player One stops playing all moves performed with frequen- 
cies small compared to the motion toward this special state. The result, for 
which Player One is indifferent, involves almost exclusively the use of similar 
such moves by Player Two such that the players can travel between these 
moves without the danger that along the way Player Two prefers to provoke 
punishment over performing one of these moves. Ultimately we show that 
there is a convex combination of such moves that all yield the same payoff 
for Player Two and for which Player One is approximately indifferent. 

In Section 7 we consider the problem of signaling, as described above; 
and in Section 8 we conclude in more detail with the problem of countably 
many states. 
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2 Preliminaries 



2.1 The Model 

Let S be the set of states; A is the subset of absorbing states and M = S\A 
is the subset of non-absorbing states. 

For every s £ S, A\ and A 2 are the moves (pure actions) of the first 
and second players, respectively, at the state s. Without loss of generality, 
we assume that \A?\ = 1 for every s £ A and % = 1,2. Let r 1 : A — > 
[—1/2, 1/2] and r 2 : A — > [u, 1] be the first and second players' evaluations 
on absorbing states, respectively, with < ui < 1. Let m be the maximal 
number of moves for either player at any non-absorbing state, meaning m = 
max ae tf(\A{\, \A S 2 \). 

Let p(t\s;a,b) be the probability of moving from s to t when a £ A\ 
and b £ A 2 are played. Let p be defined by p := mm(p(t\s; a, b) \ s,t £ 
S p(t\s;a,b) > 0), the minimal non-zero transition probability. Notice that 
in the case of finitely many positions one has such a positive quantity for each 
stage. More relevant, however, would be a sequence pi of positive quantities 
such that the series pi is divergent but sums toward infinity much slower 
than any divergent series of positive transition probabilities. Such a series is 
possible if there is a uniform bound on the number of moves. Additionally the 
discount factor must be adjusted to this series, (possibly with the discount 
factor equaling 1 — 5pi if there is only one non-absorbing state). 

Let X := TiseAf and Y := n sG Af ^(^2) be the spaces of stationary 

strategies of the players, with X s := A(A\) and Y s := A(A 2 i ). For a £ 
A\, b £ A s 2 , x s £ X s and y s £ Y s we define p(t\s; a, y s ), p(t\s;x s ,b) and 
p(t\s; x s , y s ) in the appropriate linear or bi-linear way. For any s £ Af, 
X s £ X s and a £ Af, the quantity x s a will stand for the probability, as 
determined by x s , that the move a is used. The same applies for b £ A 2 , 
y s £ Y s and y|. Define a pair (x,y) £ X x Y to be absorbing if from every 
start with probability one an absorbing state is reached. 

We will say that two positive quantities a and b are different by no more 
than a factor of positive 7 < 1 if a > 6(1 — 7) and b > a(l — 7). 



9 



2.2 Histories, Strategies, Equilibria 

For every stage % > and s G S define "Hf := {(s , a , &o)> ( s i, a i, b\), ■ ■ ■ , 

a»-i, Sj = s|V0</c<i a fc G AJ*, 6 fc € A^ fc , p(s fc+ i|s fc ; a fc , 6 fc ) > 
0}, with U% = {s} for all s e S. Define % s := U^ftf, % := U seS ^|, 
"H := Uj=o'Hi, and "H := {(so, Oo, &o), a>i, bi), ■ ■ ■ | Vi > the truncation 
up to Sj belongs to "Hf}, the set of infinite sequences. 

A strategy of Player j — 1, 2 is a set of maps (Xj = (cr| | s e A/") with <t| 
a map from 7-L s to A(Ap for all s G A/". 

With Blackwell games, a more general class than stochastic games, we 
assume that a player's evaluation on "H is a function that is measurable 
with respect to the Borel subsets of "H, the sigma algebra induced by the 
subsets of Hi for all i > 0. In case that a stochastic game is recursive, for 
every member of "H it easy to define an evaluation for both players. Either 
the infinite sequence reaches an absorbing state and the players receive the 
corresponding absorbing payoffs, or it never reaches an absorbing state and 
both players receive a payoff of zero. 

For every initial state s and every pair of strategies <j\, 02 for both players 
a distribution is induced on Iri in a natural way, resulting in two evaluations 
Vj{ai,a 2 ) for Player j = 1,2 of the expected values of the r- 7 on %. An e- 
equilibrium is a pair <j\ , a 2 such that for all s G S and alternative strategies d\ 
and cr 2 it holds that V^(ai, a 2 ) < (a 1 , <T 2 )+e and Vf (<7i, a 2 ) < V 2 (ai, a 2 )+e. 
With absorbing positive recursive games and positive co the lowest Player Two 
absorbing payoff we get the additional property that there exists an iV > 
such that with probability at least 1 — ^ the game has reached an absorbing 
state before the stage N. 

2.3 Jump Function 

For any positive real number < a < 1 let Q a be the conventionally defined 
discounted zero-sum game played against Player Two such that a visit to 
any state is discounted according to 1 — a, and let Q° be the corresponding 
undiscounted zero-sum game. For all positive a we define c a : S — > R 
to be the min-max value for Player Two in the zero-sum game Q a , with 
c a (s) = r 2 (s) for all s G A. Because the game is positive recursive the c a are 
monotonically non- decreasing and due to Mertens and Neyman (1981) the 
point- wise limit is the undiscounted value of the game Q°, though for this 
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class of games there is an elementary proof. Player Two chooses a stationary 
optimal strategy of Q a for an a > sufficiently small so that c a is within e 
of its point-wise limit and at stage i Player One chooses one of her optimal 
strategies in the game Q ai where for every % > c ai is within e/2 i+2 of the 
point-wise limit and ctj < e/2 l+2 . 

For every x G X and positive < a < 1 define the jump function 



£( a ) = (l-aW5>(f| a ,s,&) c a (t) 
beA 2 tes 



- the maximal payoff that Player Two can guarantee himself in the 1 — a 
discounted game by being punished after the next stage if Player One uses x 
at the present stage. If s is an absorbing state, define j a (s) to be r 2 (s) for all 
a. For all states it is clear that j" > c a , with equality when x is an optimal 
strategy for Player One in the zero-sum game Q a played against Player Two. 
For every state s e Af and x G X define 

Jx( s ) = argmax^^ p(t\s,x,b) c a (t). 
tes 

Let n(s) denote the state following s, in our context a random variable. If 
s is not an absorbing state and b G J"(s) then j"(s) < (1 — «)E^j"(n(s)), 
where is the expectation determined by the move b and the strategy x s . 
This makes j° a sub-martingale. 

For % — 1, 2 and a state s G 5 define q(s) to be the value for Player % of 
the zero-sum undiscounted game played against Player % starting at the state 
s. For every Player % and every stationary strategy z of Player k ^ i define 
the jump function j l z : S — > R by 

Jz( s ) = max^p(t|s,z,6)c 2 (t) or j](s) = max^]p(i|s, a, z)c±(t) 
beA 2 tes aeAl tes 

- the maximal payoff that Player i can guarantee himself against z if he is 
punished on the next stage. 

2.4 Taboo probabilities 

For any time homogeneous Markov chain, a state s, and two disjoint sets 
A and B of states we introduce the "taboo" probability P A (s, B) to be the 
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probability, with a start at the state s, of reaching the set B before the set A 
at any stage following the initial stage at s. With tc '■= inf{n > 1 | s n G C} 
P A (s, B) measures the event that t# < oo andt B < t& conditioned on s = s. 
If either set is a singleton, we can write its single member instead of the 
set. If there is ambiguity concerning which state space or which transitions, 
we identify them with a subscript. In our context of stochastic games and 
stationary strategies, P£ y (s, B) will be the taboo probability corresponding 
to the time homogeneous Markov chain generated by (x, y) G X xY. 

Define a state of a time homogeneous Markov chain to be absorbing if 
once this state is reached then the motion remains in this state forever. The 
Markov chain is absorbing if for any start with probability one an absorbing 
state is reached. 

Before moving toward the proof, we must present some basic notions 
using the taboo probabilities. These quantities will be defined first for time 
homogeneous Markov chains and then applied to the games. 

For any part p of a transition at a state s or an alternative transition p 
for that state define g{p) to be the probability that there is no return to s if 
p is used at s and the transitions remain constant at all other states. If p was 
a part of the transition at s then define f p to be the frequency with which 
p is used at the state s. For every choice (x,y) G X x Y and pair a G A\ 
and b G A 2 of moves at the state s G M g x ,y{(i, b) is the probability that 
there is no return to s given that Player One and Player Two at s play the 
actions a and b, and elsewhere in the future the stationary strategies (x,y). 
For a move b G A 2 of the second player, define g h xy to be J2 a eAi xS a 9( a ^ b), and 
define g x for all a G A\ correspondingly. 

Define the absorption rate a(s) of a state s to be the probability that 
after any visit to this state there is no return to this state, meaning that the 
absorption rate is the expected value of the function g. For the game the 
absorption rate a x ^ y (s) of a state s is J2aeAf, b€A^ xS a yb9x,y( a ^)- Given that 
(x,y) is absorbing a x ^ y (s) would be the taboo probability P xy (s,A). 

For any part p of the transition at a state s define v(jp) to be the proba- 
bility that at the last visit to s the part p was used, or equivalently v{p) = 
f P g(p)/a(s). We call this the importance of p. For a pair of moves a G A\ and 
b G A s 2 at s G Af and stationary strategies (x,y) the importance v xy (a,b) is 
xS a yb9x,y(a, b)/a x>y (s). For any move a G A{ define v a x y to be E& g a 2 ^x, y (a, b) = 
x a9x y / a x,y{s) and for any move b G A s 2 define v h xy in the same way. 

For any distinct pair s,t of states define esc(t, s) to be the probability of 
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never reaching s with a start at t. (esc stands for "escape" .) For the game we 
have 9x, y = EtesP(t\s;x,b)esc x jV (t,s). (If (x,y) is absorbing, esc X)V (t,s) is 
P£ y (t, A) and is different from P^{t, A), the probability of absorbing before 
returning to either s or t). 

For distinct states s and t let fi(s, t) be esc(s, t) + esc(t, s), and otherwise 
let fi(s, s) = 0. n is a metric on the state space. Recognize 1 — esc(t, s) as 
the probability of moving from t to s, and for mutually distinct u,v,w we 
have 1 — esc{u, w) > (1 — esc(w, t> ))(1 — esc(t>, w)) > 1 — esc(w, i>) — esc{v, w). 

Given that the Markov chain is absorbing with A the set of absorbing 
states, the following relations for states s ^ t are easy to verify: 

P^(s,A) P^(s,A) 
6SC(S ' t] ~ P°{s,t) + P{ s > t }{s,A) ~ l-P A ^}(s,s) W 

a(s) = P s (s, t)esc(t, s) + P {s ' t} (s, A) (2) 

which imply P s (s, t)/i(s, t) < a(s) < /i(s, t) and a(t)P s (s, t) < a(s) (3). 

For all these quantities and following ones, we can drop the subscripts 
and superscripts if there is no ambiguity. 



2.5 Evaluations 

We had extended the values r % : A — > R on the absorbing states to functions 
r l on all paths in "H. For any stationary strategies (x, y) and players i = 1, 2 
extend the definition of r l again to a harmonic function r* : S — > 1Z with 
r x y (s) equal to the expected value of r % on Iri as determined by (x,y). 

For any harmonic function r on S, and p a part of or an alternative 
to the transition from a state s, define v r (p) to be the expected value of r 
conditioned on the use of p and no return to the state s, with v r (p) defined to 
be r(s) if there is return to s with certainty. If the Markov chain is absorbing 
and g(p) > then v r (p) would be the new harmonic function value for s if 
the transition from s were replaced by p. For every pair of moves a G A\ and 
b G A s 2 v I y (a, b) is defined to be v r *>y of the part of the transition defined by 
the pair (a, b) of moves. Likewise define v xy (a) and v xy (b) with respect to 
the pairs (a, y G Y s ) and (x G X s , b), respectively. If (x,y) is absorbing we 
have the relation 
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For a G A\ we have 

vl y (a) := EbeA°y s b vl y (a,b)g x , y (a,b)/g^ y = £ b u x>y (a, b)v x>y (a, b)jv% y 
and for b G A| we have 

:= EaeAf « s ( fl . h )9x, y (a, b)/g b x ^ y = J2 a ^, y (a,b)v xy (a,b) /u b xy , 
with both quantities r l (s) when the quotient is not well defined. 

For any harmonic function r on S, and p, a part of or an alternative to 
the transition from a state s, define w r {p) to be the expected value of r on 
the following stage according to the one-time use of p on that stage. We 
have w r {p) = g{p)v r {p) + (1 — g(p))r(s). For any pair of moves a G A\ and 
b G A s 2 at s G M and % — 1,2 w xy (a,b) is the expected value of r* on 
the next stage if the players use the pair a and 6 on the present stage at s. 
For all b & A 2 define w xy {b) := J2 a eAi ^X,^ ^) an< ^ f° r a ^ ° e ^i define 

The following is a central lemma concerning the changes in a harmonic 
function. 

Lemma 2.1: Let 5 be the finite state space of an absorbing time ho- 
mogeneous Markov chain and r : S — > R a harmonic function. For ev- 
ery non-absorbing s G 5 let p s be an alternative transition at s such that 
g(p s ) > 0. Define a new time homogeneous Markov chain according to the 
p s . Let a* : S — > [0, 1] be the absorbing rates corresponding to the new time 
homogeneous Markov chain and let : S — > R be a harmonic function with 
respect to the new transitions such that r* agrees with r on the absorbing 
states. If \v r (p s ) — r(s)\ < 5 S and a*(s) > £ s g{p s ) for < e s < 1 and all 
non-absorbing s G S (with g(p s ) = a(s) if p s was the original transition at s) 
then the new Markov chain is absorbing and |r*(s) — r(s)| < J2t$t/tt for all 
states s. 

Proof: The new Markov chain is absorbing because a*(s) > for all 
s G 5. With a start at any state so, we can bound the change |r*(«o) — r ( s o)| 
by the sum over all states t G S of the one stage deviation at t multiplied by 
the expected number of visits to the state t. The deviation from one visit to a 
state t is bounded by \w r (pt)—r(t)\, and since l/a*(t) is the expected number 
of visits to the state t we have the total deviation bounded by £) t — ^fe^ ■ 
|w r (p t ) — r(t)| < g(pt)\v r (pt) -r(t)\ implies \w r (p t ) -r(t)\/a*(t) < \v r (p t ) - 
r(t)\/e t . □ 
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3 Changes in Taboo Probabilities 



In all the lematta of this section, S is a finite state space of a time homoge- 
neous Markov chain. 

3.1 Reaching a State 

For the first three lemmatta we look at what happens when a fraction of 
P*(t, s) is removed from the transitions at all t in a set T. 

Lemma 3.1 Let s and t be two distinct states and A and B two subsets of 
states such that A, B and {s, t} are mutually disjoint. Let p be a part of the 
transition at t such that at least positive 7 < 1 of the transition p- Bu W (£ ? A) 
goes through p (meaning that if the complement of p were removed and 
replaced by motion that went back to t on the next stage with certainty then 
the new quantity for P Bu ^(t, A) would be at least 7 times the old quantity). 
If the existing transition at t were replaced by p (followed by normalization) 
and the new transitions were indexed by * then P Bu{t} (t, A) > 7 P BLJ W(t, A) 
and Pf u{s} {s,A) > 7P BLJ W(s,A). 

Proof: P* Bu{t} (t,A) > 7 P Bu W(t,A) is given. If there was never motion 
from s to t or from t to s then the inequality Pf u ^ s \s, A) > ^yP B[J ^(s, A) 
would also be straightforward. So let us assume that there is some motion 
in both directions between s and t, and let A' be the set A unioned with all 
the other states from which there is no motion to either s or t. 

To estimate P BU{s} (s,A) let b := P Bu ^(s,A), c := P BuA ' u ^(s,t), 
d := P BuA ' u ^(t,s) and e = P Bu ^(t,A). Let 4 and e* stand for the 
contributions to d and e made by the transitions in p, so that d* < d and 
e* < e. By assumption we have e* + d*-^- c > 7(e + d-^). We suppose for 
the sake of contradiction that -fP s (s, A) = 7(6 + ^) > b + = Pf (s, A). 
Re-write as (<i*+e*)(6e*+6(i* + ce*) > (be* + bd* + ce*)(d+e) orrf* + e* > d+e, 
a contradiction. □ 

Lemma 3.2: Let T and A U U be mutually disjoint subsets of S. If 
no more than a frequency of 'jP Tu ^(u, A) is removed from the transitions 
of all u E U\A for some fraction < 7 < 1/(2 \U\) and no more than 
a frequency of 7 in the case of u G U fl A, followed by normalization, 
then for all x G S\A the new resulting probabilities P? u ^(x,A) satisfy 
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P? U{x} (x, A) > (1 - ~f\U\)P Tu ^(x, A) and for every aeA and x ^ AUT 
(1 - 3\U\-f)P? uA (a,x) < P TuA (a,x). 

Proof: For U = there is nothing to prove. Now assume the result for 
U\{u}, and let P + stand for the probabilities where the changes are made in 
U\{u}. Since by induction P+ uW (w, A) > (1 - + 7)P TuW (w, A), the 
frequency removal at u is no more than 1 _^ U ^ P+ U ^ (u, A). By Lemma 3.1 

applied to the case of only one change at u, we have for all x P* u ^ u \x, A) > 
(1 " ( I ^y)Pj UW (x,A) > (1 - -7M +7))P TUW (^^) = 

(l- 7 |P|)P Tu M(:r,,4). 

For the second half, if « £ i then it follows by induction because the 
only way to increase this probability is through the normalization. Oth- 

erwise express P* u (a, x) as P* (a, x) H — i — - ^VuAuMj — ) 

notice that 1 - pJ uAuW (m,m) > P?(u,T U A U {x}) > P* u{u} {u, A), so 
that the change 1 — P+ uAu ^ x \u,u) to 1 — P? uAu ^(u, u) cannot be a de- 
crease by more than a factor of 7/(1 — r )\U\ + 7) < 27. The rest follows by 
(1 — 7)Pj uAu ^"^(-u, < P+ uAu ^ u \u, x), (since the only way to increase this 
probability is through the normalization). □ 

Lemma 3.3 Let T be a subset of S and let s be a fixed state such that s is 
reached with positive probability from every t G T. For every t e T let q l be 
a part of the transition at the state t satisfying f q tP* t (t, s) < r yP t (t, s) where 
Pgt{t, s) is the resulting taboo probability if q l is a replacement transition at 
t. Consider new transitions resulting from the removal of the part q l at every 
t e T, followed by normalization. If \T\^ < 1 then s is also reached with 
positive probability from all of T after the changes. 

Proof: We prove by induction on the size of T; by Lemma 3.1 the claim 
holds for \T\ = 1. With v £ T also fixed, let us assume that there is some 
state u G T such that after the changes from a start at v the state u is not 
reached at all. Whether or not one reaches s from v with the changes cannot 
not be influenced by any change made at u. Therefore by the induction 
hypothesis, considering changes made in the smaller set T\{u}, we have our 
result. 

Now assume that with the changes all member of T are reached from v. 
For every pair t,u G T let Wt(u) be the probability in the original Markov 
chain with respect to a start at t that s is reached and that the last visit to a 
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state in T was at the state u. Because starting at t rather than at u cannot 
be a better way to reach s through the state u, we have w t (u) < w u (u). But 
then there must be a u G T such that w u (u) > j^J2t w t(t) > jf\ J2t w u(t)- 
This means that at least of the original motion P u (u, s) went directly to 
s without passing through any other member of T (and therefore after the 
changes there is still motion from u to s). □ 

The following lemma concerns transitions in two person stochastic games, 
but can be generalized to any time homogeneous Markov Chain whose tran- 
sitions are determined by two independent variables. 

Lemma 3.4 Let R be a subset of non-absorbing states, U a subset of 
R, and (x, y) a pair of stationary strategies such that there is some motion 
between all pairs of states in R. Let s, t G U be special states. Assume 
for every u G £7\{s} that no more than a frequency of r yP u (u, s) is removed 
from x u G X u and no more than a frequency of 7 from x s , followed by 
normalization; let x stand for the result. Assume for the state t G U that 
P-iy( s ->t) > e Pxy( s it)- Let y" be a part of y u for any u G U with /" its 
frequency. Assume for all u G U and both z G {s,t} that f*Px ( y \ y u ){ u i z ) — 
5P™ y (u,z) where is the strategy that is y v when v 7^ u and is y" 

otherwise. Let y stand for the result when y" is removed from y u for every 
u G C/, followed by normalization. Given that (1 — 4^y|J7|)e > 8\U\ with (x, y) 
there is some motion from all states in i? to s and also some motion from s 
to f. 

Proof: Since the part of P^ y (u, s) that was removed cannot exceed 5 + 7 
of the whole, we have from Lemma 3.3 that s is reached from all states v in 
R. 

As with the proof of Lemma 3.3 we can assume by induction that all 
u G U\{t} are reached from s with x and y. We account for P£ y (s,t) by 
considering the last state visited on the way from s to t. For any choice of 
(x, y) let Px,g(u, t) := P^ y (u, t) be the probability of moving from u to t with 
no other member of U in between. Let U' := U\{s,t}. We have 

p?~(s t) = v~ ~(s t)+T i'- { " J)r ^ }{s -" ) 

P- { *: s} ( s ,m) 

since — ^frvr is the expected number of times that u is visited before 
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reaching t or returning to s, with 1 — p{ s,t,u }(u,u) = P u (u, {s,t} U A) > 
P u (u,s), where A is the set from which there is no motion to the set R. 

Define for all u G U' e(u) := — x ' v s t u s ' u . , with e(s) = 1, and define eJu) 

correspondingly with respect to x and y, with e*(s) = 1. By Lemma 3.2 we 
have (1 — 47(|C/| — 2))e*(u) < e(u) for all u G U'. We can conclude that 

ueu\{t} 

e(l-4(|C/|-l) 7 )P x y S ,t) = e(l-4(|C/|-l) 7 ) ^ e( M )^>,t). (4) 

uec/\{t} 

Next define p x ,y(u, t) := P^y(u, t). By recognizing that p x ,y{u, t)e(u)/P s (s, t), 
the probability that the last visit to U was at u e £/ from a start at s, is less 
than or equal to the probability that the last visit to U was w with a start at 
u (both according to (x,y)), we have from the defining condition on y that 
\Px, y (u,t)e(u) — p Xt y(u,t)e(u)\ < 5P s (s,t). After summing over U\{t} we get 

E e ( u )Px,v(u,t) > (1 -<J|C/"| +5) E e(u)Px, y (u,t) (5). 
«gc/\{<} «ec/\{t} 

To show that u reaches t for some u G C/\{i}, it suffices to show that 
Px, y (u,t) + p Xj y(u,t) > p Xj y(u,t) for some u G U\{t}. But assuming that 
Px,y( u , t) +p x ,y( u , t) < p Xj y(u, t) for all u G U\{t}, from the above sums in (4) 
and (5) we must conclude that 1 — 5\U\ + e(l — 47|C/|) < 1, a contradiction 
to the initial assumption. □ 

3.2 Continuity and Exiting 

Because of the unlimited number of stages, taboo probabilities and harmonic 
functions of time homogeneous Markov chains are not continuous with re- 
spect to absolute changes in transition probabilities. However, there is a con- 
tinuity for relative changes in these transitions. A result of the same spirit 
but in a different formal context is contained in Freidlin and and Wentzell 
(1984). 

Lemma 3.5 Assume that the transitions p s G A(S) at a subset U are 
changed such that for all t G S, including s — t, the resulting pl(t) differs 
from p s (t) by no more than a factor of positive 7 < 1/(2|C/|) (necessarily 
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with p'l(t) = if and only if p s (t) = 0). Let P?(s, A) stand for the resulting 
taboo probability. For all choices of s, T, and A with T n A = 0, P T (s, A) 
differs from P*(s, A) by a factor of at most 47|C/|. If the original Markov is 
absorbing then the resulting Markov chain is absorbing and if r : S — > R is 
a harmonic function with respect to the original Markov chain and r* is the 
resulting harmonic function that agrees with r on all the absorbing states 
then \r(s) — r*(s)| < 47|C/|M for every s G S, where M is a bound on the 
difference between the function values of r on these absorbing states. 

Proof: Let U := {s±, . . . , Sn}- Let P^(s,A) stand for the taboo prob- 
ability when the changes are made only at the subset {si, s 2 , . . . , Sj}, and 
define escj(t, s) in the same way. 

First we claim that for every fixed choice of s, T, A with s e T that 
Pj T (s, A) and Pf_ x {s, A) differ at most by a factor of 27. Since both P?(si, A) 
and P^Li(si, A) are expectations over the next stage of some probabilities, we 
have our claim for Pf^Si^A) and a factor of 7 by the defining assumption. 
If s 7^ Si then we get our result from the same observation and the formula 



Pf(s, A) = Pi u|s < j ( S , A) + PJ (s, Si )Pi u{s '^(s u A)/P l Si (s i , TUBUAU {a}), 



where B is the set of states such that in either the ith or i + 1st Markov 
chain there is no motion to the state from the set B. 

From formula (1) we have l-escjv(t, s) = P^(s, t)/(P^(s, t)+Pj^' t \s, B)) 
and from above that 1 — esc7v(t, s) does not differ from 1 — esc(t, s) by more 
than a factor of 27^. Notice that 1 — a(s) can be written as the expected 
value of 1 — esc(t, s) on the next stage, and therefore 1 — a(s) does not differ 
from 1 — aAr(s) by more than a factor of 27./V, where a at is the resulting 
absorption rate. This implies that a(s) = 1 if and only if ajv(s) = 1 and 
in this case we have P^(s,A) = Pj u{s} (s,A), P T (s,A) = P Tu ^(s,A), and 
our result. Given a(s) ^ 1 then by P T (s,A) = P Tu{s} (s, A) /{I - a(s)) and 
P^(s,A) = Pjj- U ^ (s , A) / (1 — ajv(s)) we also have our result. The claim 
concerning harmonic functions follows by considering A to be any subset of 
absorbing states. □ 

Next we define the concept of exit. (Due to the lack of the semi- algebraic 
analysis, we will be more restrictive in our definition of an exit than Vieille 
2000a or Solan 2000.) For any subset P of non-absorbing states a system 
of exits from P is a collection of parts of the transitions at the states in P 
such that all motion from P to S\P must occur through one of these parts. 
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Each part in the collection is called an exit. Given that the Markov chain is 
absorbing any subset of non-absorbing states must have a system of exits. 

Assume that there is a partition V of the states such that {s} is in V 
for every absorbing s and for every no n- absorbing s £ P £ V q s £ A(S) 
is the transition defined conditionally by the union of all the exits from P 
at the state s. Let A be the set of absorbing states. For every P £ V let 
sp £ P be a represent at at ive for the set P. We will create two new time 
homogeneous Markov processes, one by extending the state space and the 
other by contracting it. These constructions are also in Vieille (2000c). 

First we extend the state space. For every s £ P £ V, create two new 
states s a and s b . Define S* := {s a | s G A} \J s eS\A {s a ,s b }, and the cor- 
responding Markov chain will be indexed by *. The states {s a \ s G A} 
remain absorbing. At s a with s G P G V, the motion goes deterministically 
to s p . At s b the transition is labeled pi G A(S*). Let f s be frequency 
with which q s is used. Let p s be the transition defined by p s conditioned 
on the non-use of q s , given of course that f s ^ 1. Define pi (t a ) = f s q s (t) 
and pi (t b ) = (1 — f s )p s (t) (and otherwise zero if p s is not defined), with 
Pi ( a ) — P s (a) if a G A. 

Given that the Markov chain is absorbing, next we contract the state 
space. Define S$ = {sp | P G V}. A previously absorbing state remains 
absorbing. For every non-absorbing state sp let the transition at sp be 
induced by the distribution on the next state t a following s b P in the above 
Markov chain defined on S*. If t a is absorbing, then t is that next state. 
If t a is not absorbing, the u = Up> is the next state with t G P' G V. 
Since the Markov chain on S* is absorbing, modulo events of zero probability 
the transitions of S$ are well defined. In a different context (without taboo 
probabilities) a similar statement to the next lemma was proven by Vieille 
(2000c). 

Lemma 3.6: Assume that the Markov chain is absorbing. Let r be a 
harmonic function on S and M > a uniform bound on all differences in the 
values of r. Let iV be the number of the V that are not singletons, and let 
< 5 < be given. Assume for every P G "P and every distinct pair s,t G P 
that the probability of moving from t to s without passing through any exit 
of P is at least 1 — 5. The new processes on S* and S$ are absorbing and for 
any pair of subsets A and T that are unions of members of V with A fl T = 
we have that Pf-*(s,T : „) differs from P A (s,T) by no more than a factor of 
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AN8, where := {s a ,s b \ s G B} for all subsets B. With r* representing 
the new harmonic function on S* determined by the expected value of r on 
the absorbing states and rj the same for S$ we have r^s^) = r^(sn) for all 
representative states s R and |r*(s a ) — r(s)\ < AMN8 for all s E S. 

Proof: Define two new transitions (p s \ s G S) and (p s \ s E S}) on 
S. p s is determined by the distribution on the next state t a in 5* from a 
start at s a G 5*. p s is defined likewise, however from a start at s b G 5*. 
The distribution on the states outside of P with the p s is the same as with 
the original transitions p s on S. Because of our assumption concerning the 
avoiding of exits, Lemma 3.5 applies to the difference between p and p. The 
claim for the taboo probabilities follows directly from Lemma 3.5, as does 
also the claim for the harmonic functions. □ 

Lemma 3.6 works because it is based upon the rare use of an exit. Much 
more problematic is analysing the consequences of the certain use of an exit. 
This is the content of Lemma 3.7. 

Lemma 3.7 Assume the context of Lemma 3.6 and that p is an exit from 
P at t G P with g{p) > 0. We have 

1) \g(p)-gt(p)\<* N 8 + 8, 

2) g(p) and g$(p) differ by a factor of no more than AN 5 + ^y, 

3) v{jp) and v%{jp) differ by a factor of no more than AN 8 + 25 + AN g ^ 5 , 

4) W(p) - "t(p)\ <8N8 + A8, 

5 ) \v r (p)-v r \p)\ <Mmin{87V5 + ^ , 8N5 + ^)}. 

Proof: 1) We define g to be the probability that there is no return to the 
set P after using the exit p in the original Markov chain. From Lemma 3.6 
we see that g is within a factor of AN 8 of g$(p). From the avoiding of exits 
we get that \g — g(p)\ < 8, which suffices. 

2) By definition g(p) > g. First we show that esc(w,t) < 8g/((l — 8)u(jp)) 
for all u G P. Define w u be the probability that p will be used before returning 
to u from a a start at t (with w u < 8 for all u G P). Define v u to be the 
probability that the last visit to P is through the exit p from a start at u; 
we have u u < w u g/(w u g + esc(w,t)), which translates to esc(w,t) < w u g/u u . 
Finally notice that v u doesn't differ from v{jp) by a factor of more than 8. 

Next we compare g(p) with g. For every u G P let \ u be the probability 
that there is a return to P from the use of p in the original Markov chain 



21 



and that u is the first member of P reached. Notice that J2u = 1 — 9- We 
have g(p) = g + J2u A M esc(w, t). This suffices for (1 — 25 /u{p))g{p) < g. Now 
use Lemma 3.6 for the conclusion. 

3) By definition v*(p) = f p g*(p)/a*(t b ) and v(p) = f p g(p)/a(t). One way 
to perceive a(t) is as the reciprocal of the expected number of visits to t from 
a start at t. With this perspective by Lemma 3.6 and the avoiding of exits 
we get that a*(t 6 ) and a(t) don't differ by a factor of more than AN 5 + 5. 
This means that if g*{p) and g{p) don't differ by a factor of more than 7 
then v*(p) and v(p) don't differ by more than a factor of 7 + AN 5 + 5. Since 
z/jj(p) is also equal to the probability that the last visit to P starting at s b P 
in the Markov chain S* went through the exit p we have that v*(p) is within 
a factor of 5 of v$(p) and therefore v$(p) and u{p) don't differ by a factor 
of more than 7 + 4N5 + 25. By the same argument as in Part 1 comparing 
g$(p) with g{p) we get \g*(p) — g(p)\ < 4=5 N + 5 and therefore g*{p) and g{p) 
cannot differ by a factor of more than 4< ^ ) | <5 and our conclusion. 

4) The argument of Part 2 can be repeated with the Markov chain defined 
on S* instead of the original on S. The quantity g(p) would be replaced 
by g*(p) and g would be replaced by g$(p). We have g*(p) > g$(p) and 
9*{p) = 9t(p) + (1 ~ 9i(p))esc*(s b ,t b ). 

If g{p) > fi 1 *^) w e need only g*(p) > g$(p) and the conclusion of Part 2 
to get g(p) > (1 — 25/v(p) — 45N)g*(p). Combined with the arguments from 
Part 3 we have our goal. On the other hand, if g*{p) > g{p) we get our result 
from repeating Part 2 for g*(p) and g$(p), the same arguments of Part 3, plus 
the claim that (1 — A5N — 5)esc*(s b P ,t b ) < esc(s P ,t). 

esc*(s 6 , t b ) is no more than (w + w 2 + . . where w is the probability of 
reaching an exit of P from s b p before returning to t b and the quantity h* is 
the expected value of g$ conditioned on the use of one of these exits. On the 
other hand we have that esc(s P ,t) is at least wh where h is the probability 
of no return to the set P in the original Markov chain conditioned on the use 
of one of these exits. From Lemma 3.6 we have that h and h* differ by no 
more than a factor of 45N. That w < 5 completes the proof of the claim. 

5) From the proof of Part 1 we had that g > (1 — 5/ g(p))g(p) and from 
Part 2 that g > (1 — 25 /v{p))g{p). The rest follows from Lemma 3.6. □ 

Part 4 of Lemma 3.7 is remarkable because the sum of v over all transi- 
tions in a set P will be \P\ rather than something close to one. 
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3.3 Polarization 



The process described below, of changing the transitions through a convex 
combination of two transitions, one giving a higher value and the other giving 
a lower value of a harmonic function, with the convex combination yielding 
the same value, we call polarization. 

Lemma 3.8 Let s and t be two non-absorbing states of an absorbing 
Markov chain. 

(i) Let p be a part of the transition at t such that v(p) > e > 0. 

(ii) Let p be a replacement transition at t such that g(p) > e. 

(hi) Let p be a transition at t that is a convex combination of transitions 
as described in (i) and (ii). 

In all three above cases, if we replace the transitions at t by p, in the case 
of (i) or (iii) using normalization, the resulting process is absorbing and the 
absorption rate of s is at least e times what is was before the changes were 
made. 

Proof: Let b, c, d and e stand for the same quantities as in the proof of 
Lemma 3.1, with A the set of absorbing sets and B the empty set. 

(i) It follows from Lemma 3.1. 

(ii) Let a*(s), d* and e* be the corresponding quantities when p is the 
transition at t. We assume that e* + g?*^ > e. Suppose for the sake of 
contradiction that e(b + = ea(s) > a*(s) = b + . Then we have 

6e* + ce* + 6d* > {b + c)e > e(b+^) > fed *+^;+ ce * . This* implies d* + e* > 1, 
also a contradiction. 

(iii) First we must assume that b < ea(s), since otherwise there would 
be nothing to prove. Let Oj, di and e« for % = 1,2 stand for the resulting 
probabilities from (i) and (ii), respectively, and after normalization in the 
case of (i). With the convex combinations d := \d\ + (1 — X)d 2 and e := 
Aei + (1 — A)e 2 being the new transition quantities, we have that our desired 
result is equivalent to ^ e ^ + ^T^- -^ u ^ ^ s f°U° ws from (i), (ii), and 
the fact that ^ > z and ^ > z implies \ Xl ^ 1 ~^\ X2 > z for all non-negative 

yi — 2/2 — ^ Aj/l+(l-A)j/2 — ° 

quantities z and < A < 1. □ 

Proposition 3.9 Let r 1 and r 2 be two harmonic functions, and we assume 
that the Markov chain is absorbing. Let N be the number of non-absorbing 
states. Let 1 be a uniform bound on all differences in the values of r 1 and 
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r 2 . Let w 1 , w 2 , v 1 , and v 2 stand for w r , w r , v r , and v r , respectively. 

3 N 

Let 1/2 >e>5>7>0, with 5 < jwhww- Let P*s ^ e a P ar ^ °^ the 
transition at s such that w 2 (p*) < r 2 (s) — e (including the possibility that p* s 
is empty). Assume that if v{p* s ) > 7 then there is an alternative transition 
p s at s such that w 2 {p s ) < r 2 (s) — e, |u 1 (p s ) — r 1 (s)| < <5, and there exists 
another part g s of the transition at s such that gf, the complement of the 
union of q s with p*, satisfies iv 2 {qf) — r 2 {s))v{qf) < NS/e. For every subset 
T C {s | > 7, w 2 (q s ) > r 2 (s)} define a new time homogeneous Markov 

chain by the transitions at t G T defined by Ap t + (1 — A)g t with A satisfying 
Xw 2 (p t ) + (1 — X)w 2 (q t ) = r 2 (t) and furthermore for every t> G S*\T the part 
p* is discarded, followed by normalization. Let the subscript T stand for the 
quantities determined by the new transitions with the changes in T. 
Conclusion: There is a subset T C {s | v(p*) > 7, w 2 (q s ) > r 2 (s)} such 
that the new process is absorbing and for both % — 1, 2 and all s G 5 |r^(s) — 
r*(s)| < e 

Proof: First we consider what happens when the changes are made only 
at a set T (meaning that the part p* is kept in for v (jL T), which we will 
label with T, *. Because r 2 remains a harmonic function after the changes 
are made and there is always a positive probability at all states in T that 
the harmonic function drops by e, the resulting time homogeneous Markov 
chain is absorbing with r^^(s) = r 2 (s) for every s G S. 

Next we must determine which subset T will be chosen. Choose any t\ 
such that ^(p^) > e 2 /2N, and put t\ in T. If there exists no such t G S then 
let T be the empty set. At any set T with |T| = fc — 1 formed so far, put 
into T any t k such that VT,*{p* tk ) > e 2 /2N, and stop if there is no such new 
state tfe. 

Claim: For any set T that has been already chosen and any t $T that 

3 3 3 1 T I 

could be added to T we have a T u{t},*(u) > -^aT,*(u) > ^ N yr\ a (u) for all 

u g S, g T Mt) > ^(^m-^(ft) and w2 (<?t) > r2 (0- 

Proof of Claim: Assume that t will be added to T. Look at the tran- 
sition qf and the indentities w^^qf) — r^ t (t) = w 2 (qf) — r 2 {t) = (v^^qf) — 
r T,*(t))9T,*(Q.f) = (v 2 (q?) — r 2 (t))g(qf) from the fact that r 2 remains the har- 
monic function. Consider the definitions VT,*{qf) — f q f9T,*(qt)/ a T,*(t) and 
v{qf) = f q dg(qf)/a(t); they show that the new absorption rate determines 
alone the new value {v^ J^qf) — TT^{t))v T ^(qf). From the induction assump- 
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tion we must conclude that VT,*(qt)( v T,*(qt) _ r2 (^)) < ^Jam v(qf)(v 2 (qf) - 

r 2 (^)) < ^rr^r < e3 / 6Ar - If 5t is the union of Qt with Pt from ^,*(Pt) > 
e 2 /2iV and w 2 ^{p* t ) < r 2 {t) - e we get that M^K,*^) ~ r 2 (t)) < 
which implies that w 2 (q t ) > r 2 {t) and vr,*{qt) > e 3 /(3iV). 

Next suppose for the sake of contradiction that gT,*(qt) < aw 7 (3jv)I t I ffw*)- 
Since u{q t ) = f qt g(q t )/a(t), u T ^(q t ) = f qt g T Mt)l a T,*{t) and i*z>(&) > e*/(3N), 
by the induction assumption we would be forced to accept f(?t) > 1, an im- 
possibility. 

By Lemma 3.8 we have our claim on the absorbing rates for all states 
other than t. For the state t we have gT,*(qt) > fq t 9T,*(qt) = ^T,*(qt)dT,*(t) > 
e 3 a,T,*(t)/(3N). With gT,*(pt) > e our claim is proven. 

With the claim we conclude from Lemma 2.1 that |^*( s ) — rl ( s )l < 
^ff^SN < e/2 for all s G 5. 

Next, we must show that it is impossible for any state s not polarized 
to satisfy vt,*(p* s ) > e 2 /N. This holds for all states with v{p* s ) > 7, by 
construction. Let's assume that v(p*) < 7; this means that the probability 
of ever using p* in the original Markov chain cannot exceed 7/e. But by 
the above claim we know additionally that the probability of using p* in the 
altered Markov chain indexed by T, * cannot exceed ^ (sn)N-i < t 2 /2N. 

Next we must consider the influence of the removed p* t in the above 
Markov chain indexed by T, *. For any s with vt,*{p*s) < e 2 /2N the chance 
of ever using the transition p* t cannot exceed e/2N, and so they cannot con- 
tribute an average of more than e/2 to either the function r 1 or r 2 . □ 

4 From Markov Chains to Equilibria 

4.1 Application of the Doob-Kolmogorov Inequality 

We must prove Proposition 4.2, a cornerstone of our analysis. 

Lemma 4.1: Let X be the finite state space of a time homogeneous 
Markov chain with probability transitions (p x G A(X) | x G X). Let v : 
X — > R be a harmonic function and let M > be a bound for the maximal 
difference between all values of v. 

For every x G X define the non- negative quantities w(x) by w(x) = 
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J2y&x P x {y)\ v {y) —v(x)\. Let n be the number of states x such that w(x) > 0. 
For any path p = (xq, x±,X2, ■■■) in X define w{p) = Y^Zo w(xi). 

Conclusion: The expected value of the function w does not exceed Mn. 

Proof: We isolate the problem, handling each state x separately. Since 
\v(y) — v(x)\ is always less than or equal to M times esc(y,x), we have that 
w(x) < a(x)M. Therefore the part of the sum that comes from visits to x 
does not exceed a(x)M EZo( l -a(x)Y = M. □ 

Proof of Proposition 4.2 (as stated in the introduction): 

Define the random variable on the odd steps % to be v(yi) — v(xi-i), 

and Ri to be the sum of the r\ for odd k < i. For y G Y x define r(y) to be 

v(y) -v(x). 

Define a new quantity w(x) := Y y eY x P x (y)\ v (y) ~ v ( x )\- Let w(x) be the 
old quantity on the Markov chain from Lemma 4.1 defined only on the X, - 
we ignore the visits to the Y x sets, and consider only the motions from X to 
X. 

The Doob submartingale inequality states that if (Si | % = 0,1,..., n) 
is a martingale with zero expectation then for every n > 0, positive value 
c > and exponent p > 1 the probability that maxj<„ \ Si\ > c is less than 
E(|Sn| p )/c p (Williams 1991, Section 14.6). Since the martingale property 
implies that E(S^) is equal to the sum over all the stages 1 < i < n of E(sf) 
where Sj is the change in value between the i — 1st stage and the ith. stage, 
we have for every finite even and positive Q 

Probability (max^ > e) < ^E(^ P Xi ~ 1 (y)f(yY)- 

By taking the limit as Q goes to infinity and 5 < \r(y)\ we get 

Probability (max \Ri\ > e) < \e( V p Xi ~ 1 (y)r(y) 2 ) < 

Koo, y€Y Xi _ 1 

5^E( Yl P x ^(y)\r(y)\)=5^E( Yl 

i<oo, y£Y Xi _ 1 i<oo, y£Y Xi _ 1 

Since by the triangle inequality w(x) < w(x) for all x, we have 

Probability ( max \R t \>e)<5-E( V 

^ i<oo / e ^ ■ — '^^ ' 

Koo, yei^^_ 1 
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and by Lemma 4.1 this is no more than SMn/e 2 . So with e < 1/2, we have 
our result from the size of 5. □ 



The problem of extending Proposition 4.2 to Markov chains that are not 
time homogeneous (or have countably many states) lies with Lemma 4.1 and 
not in the proof of Proposition 4.2. 

The following corollary relates the above work on Markov chains to our 
two-person stochastic games. Because the application of this corollary in- 
volves an altered state space, this result should be understood in an abstract 
way. 

Corollary 4.3: Let (x,y) G X x Y be stationary absorbing strategies. 
Assume that 

1) for both players k = 1, 2 and s E S r k y {s) is greater than j k z (s) — e with 
z = x if k = 2 and z = y if k = 1, and that 

2) for both player k — 1,2 and all moves c used with positive probability 
with (x,y) by Player k the value w k y (c) is within 5 of r k y (s). 
Conclusion: For any positive e < 1/2 if 5 is no more than ^ then the 
strategies (x, y) generate a 4e-equilibrium of the stochastic game. 

Proof: We define the following strategy for Player k. For every starting 
point s G 5 let n SQ be large enough such that with a start at s and the 
play according to (x, y) the probability that there is no absorption before the 
n So th stage is less than e/10. Let sq, si, . . . be any sequence of states reached 
in the game and for both k let Cq, c\, . . . be the sequence of moves made by 
Player k. For k' ^ k as long as J2i=o( w x, y ( c i' ) — r x, y ( s i)) ^ e an d the stage 
I does not exceed n so and Player k' never chooses cf outside of the support 
set of his stationary strategy, then Player k continues to act according to 
his stationary strategy. As soon as one of the above conditions is violated 
at some stage / then on the next stage I + 1 both players punish eachother 
according to the functions C\ + e and c 2 + e. (The mutual punishment is 
necessary because otherwise a player could intentionally prolong the game 
with an interest in punishing the other player. The result can be extended 
to multi-player stochastic games if it can be determined who should punish 
whom in all situations!) That no player k can obtain an expected payoff 
more than 2e above the function r k by choosing a different strategy is self 
explanatory. That punishment occurs before absorption with probability no 
more than 2e if both players adhere to the suggested strategies follows from 
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Proposition 4.2. 



□ 



4.2 Situations 

Next we create an expanded state space from the original state space through 
partitions of the histories. For every s G S let V s be a partition of % s . Define 
S to be the disjoint union U seS V s . For every t G S let b(t) G S be the member 
of S such that t G V b ^ . A member of S we call a situation. We define the 
situations iS to be normal if and only if the next u G S following a £ G <S 
is determined by the situation t, the choice of moves by the players at t, 
and the next s G S with 5(w) = s. Normalcy implies that one can define a 
stochastic game on the situations as a new state space. 

Corollary 4.4: Let the situations S be normal, let absorbing stationary 
strategies (x, y) G ll s6< s A(A^) x Il se< s A(Aj^) be defined on the situations 
<S, with r k y : «S — > R the expected payoff for Player fc as determined by the 
above stationary strategies and the functions r k on the absorbing states and 
w k the corresponding expected value of r k x on the next stage. Assume that 

1) for every s G S r k (s) > j k (b(s)) — e where z = x if k = 2 and z — y if 
k — 1 and 

2) for every move c used with positive probability at a situation s by Player 

k H, y (c) - f k >y (s)\ < 5. 

3 

If 5 is no more than then these stationary strategies generate a 4e- 
equilibrium of the original stochastic game. 

Proof: Because a stochastic game is defined by the normality of S and 
the conditions of Corollary 4.3 are preserved, the result follows by Corollary 
4.3. □ 

4.3 First Main Theorem 

For any subset R C J\[ and a state s G R, a pair a G A\ and b G A| of moves 
is called a primitive exit from the set i? if with positive probability there is 
motion from s to S\R using the pair a and b. By the definition of p, any use 
of a primitive exit at s results in a probability of at least p of reaching the 
complement of R. 

For every subset B of Player Two moves in a set R we define a 5 exrt (or 
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simply exit if there is no ambiguity) from R to be any pair (a, b) of moves at 
an s E R such that (a, b) is already a primitive exit from R or b E B. Let 
E B (R) stand for the set of all B exits from _R. 

Define B2 y {s) to be those moves of Player Two at the state s with 
w l,y(b) < r x ^ y (s) — 7, and let BJ (R) be the union of all the B2 y (s) for 
all s G R. For every s E Af define z2 tV (s) to be J2beB2 y (s) yh ■ For any subset 
RQN define ^(i?) := E se R^,j/( s )- 

For any stationary strategy x G X (or y G Y) define a simplication of x to 
be another stationary strategy x G X obtained from x by dropping the use of 
certain moves, followed by normalizing what remains. Call the simplification 
a 7-simplication if the frequency of the moves removed did not exceed 7. The 
simplication is within a set T of states if changes were made only within the 
set T. 

Theorem 1: Assume for every choice of positive l/2>e>e>e>e>0 
with e < e 3 /(50|A/"|), e < ^^p^ and e < e e/40|jV| that 

1) there are absorbing stationary strategies (x,y) G X x Y with 

a) rl y (s) > 3 2 x {s) - e/2 for all s E J\f, 

b) > j^(s) - e/2 for all s E M, and 

c) for every move a of Player One used in x with positive probability at s we 
have \wl y (a) - rl >y (s)\ < i, 

2) a partition 7£ of a subset P C Af and for every R E 1Z a set -Br of Player 
Two moves in R containing B e xy (R) such that 

a) Vs G 1 P zl^(s) < i and 

b) for every distinct s,t E R E 1Z the probability of reaching s from t before 
using a member of E Br (R) is at least 1 — 7* with 7* := e e/(40n|jV|), 

and for any R E 1Z if z x b y (R) > e then there is a special subset D R CE, a 
representative s# G and 

3) an e simplication y R of y within R created by removing the set B R of 
moves such that 

a) v% (b) < r 2 (s) for every b E B R fl A 2 S and 

b) -rl ty (s R )\ < e, 

4) e-simplications (xc,yc) of (x, y) within such that with (xc,yc) the 
play never leaves the set and from any state in D R all other states in D R 
are reached with probability one, and 

5) a strategy yr> created from y c by adding to yc in the set D R small probabil- 
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ities of using a subset of Player Two moves V R used in Dr with Vr C B^{R) 
and a real positive value £r < ^^(s/j) — 2.4e such that 

a) with (xc, yo) for every pair s, t G -Dr the probability of reaching s from t 
before using a member of Vr is at least 1 — 7* 

b) U>f x (t)-eior elite D R , 

c) for all moves b G Vr |w^ i2/ (6) — £r| < e, and 

d) l^ C)J/0 (s) -r^(s)| <e. 

Conclusion: With the assumption that Player One can send transition in- 
dependent signals, the stochastic game has approximate equilibria. 

Proof: We define the set B of Player Two moves to be \J r^B R VJ S ^ P £?f, 
and define the exits to be the B exits. Let the corresponding state spaces 5* 
and iSjj from Lemma 3.6 be induced by (x, y) and the partition TZU {{s} \ s G" 
P}. For every sr G «Sj let p* R be the transition at sr in S$ induced by 
the Player Two moves in B xy € (R). For every it! G 1Z define pr to be the 
alternative transition from sr in «Sj induced by the Player Two moves Vr 
according to (xc,Vd)- Define q c R to be the transition induced by the moves 
in Br, and define q R so that q c R is the disjoint union of q R with p* R . 

We will confirm the conditions of Proposition 3.9 on the state space S$, 
with 2.4e, 2e, and 2e the quantities e, 5, and 7 of that lemma, respectively. 

First, by Lemma 3.6 the Markov chain on S$ is absorbing. For % — 1,2 
let rj : «Sjj — > R be the harmonic function that agrees with the function r % on 
the absorbing states. If v${p* SR ) > 2e then by Lemma 3.7 z^y e (R) > 3e/2 and 
if s $l P then v${p* s ) < l.l^^ e (s) < l.le. By Lemma 3.6 we have for every 
representative s R that r^(s R ) is within 4^y* | A/"| of r % x y (s R ). Equally important, 

2 1 

Lemma 3.7 implies that w r * (p* R ) < r^(s R ) - 2.4e, and \v r t(p R ) - rj(s R )\ < 
lle/10. Since q R is induced by some B R moves by Lemma 3.7 and Condition 
3a we have (v r » (q d R ) - rl(s R ))^(q d R ) < 2e. 

Left to confirm is that |t> r tt (q R )—rj(s R )\ < 2e. We apply Lemma 3.6 to the 
pair (x, y R ) and the transitions it induces on S$. Since the avoiding of exits by 
(x, y) implies the same for the pair (x, y R ) , we have that |r 1 (s R ) —r\ yR {s R ) \ < 
47*|A/"|, where r 1 is the harmonic function induced by (x,y R ) on S$. ^(sr) 
is equal to v r $ (q R ). With the given \rl yR (s R ) — r]. (sr)\ < e and the above 
relation of r\ y to rj we are done establishing the conditions of Proposition 
3.9. 

We apply Proposition 3.9 to S$ with T the subset of 1Z that has been 
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polarized. We conclude that the new harmonic functions r l T := (rfyj- on *Sy 
satisfy \fj-{s) — r l xy {s)\ < 3e for all s £ P and \r l T (sR) — r l xy {sR)\ < 3e for all 

Ren. 

Next we define the situations S, with one, two, or three situations defined 
for each original state in S. For any s ^ P or for s G R G 1Z with R G" T 
not polarized there is only the situation s e (including the case of absorbing 
states). We always start the game at an s e . At any situation s e for s G - P 
or s in a non-polarized R T the players perform (x s ,y s ) where y s is the 
7* simplication of y s resulting from the removal of all Player Two moves in 
B^ £ (s). If s is in a polarized R G T and is not the representative s# the 
players perform (x, Dr). Following any s e other than s = Sr the next situation 
is a t e , where t is the next state in S. Also following the performance of an 
exit, no matter what the situation was on the previous stage, if t G S occurs 
on the following stage then the next situation is also t e . This means that 
only motion inside of an R G T involves situations other than those with the 
subscript e. 

At any s G R G T there is either two situations s e and if s ^ D R or 
three situations s e , \ and s 9 if s G D R . For such an R G T let A fi be the 
quantity determined by the application of Proposition 3.9 to the transitions 
on «Sjj. Since Player One can send signals, for every sr G Dr for a polarized 
R G T we associate one of every pair of her moves with the symbol / and the 
other with the symbol g. If s e R is the present situation then with probability 
Xr Player One chooses a move associated with the symbol g and with 1 — \ R 
a move associated with the symbol /; in both cases the players perform 
(xc-,yc)- (Because all moves are paired, we can modify xc to use only those 
moves corresonding to / or only moves corresponding to g without changing 
the transition probabilities in the space S.) If t is the next state and a move 
corresponding to / was used, then is the next situation; otherwise the next 
situation is t 9 . At any with s G R G T the play continues according to 
(x,y R ), always to a next situation if there was no use of an exit. On the 
other hand, from any s g with s G D R the motion follows (xc, Vd), and unless 
a move from Vr is used the next situation is a t 9 , necessarily with t G Dr. 

Define r l to be the harmonic function on S determined by the above 
defined stationary behavior and r % = r l on the absorbing states. Given the 
above conditions, to apply Corollary 4.4 it suffices that neither player % can 
change the expected value of r % by more than lOe at any one stage. With 



31 



the role of the £r we need only show that f l is within e of rj on all the Sr 
and the s P. To do this, we introduce two new transitions defined on 
S, indexed by o and I. p } and p Q are identical on states s that are not in 
a polarized R, and then it is that induced by the behavior at the situation 
s e . At s in a polarized i? G T is the distribution determined by the 
next situation t e following the situation s e . p s Q is determined by the next 
situation t e conditioned on having reached either s R or s 9 R before any exit 
was performed. The p s transitions generate harmonic functions r* that are 
identical to on the S$, and the p l transitions generate harmonic functions 
r\ that are identical to r l on the subset {s e \ s G S}. Because Ar cannot be 
greater than 1 — 2e and the probability from a situation s e that an exit from 
the stationary strategies (x, yn) is used before getting to s e R is no more than 
than 7*, for every s,t G S the transition probability pf(t) does not differ by 
more than a factor of 7*/e from p s (t). Finally Lemma 3.5 implies that the 
functions r l and r\ do not differ by more than A^*\J\f\/e < i. □ 

5 The auxiliary game 

The main issue is to define the "correct" discounted evaluation of Player 
Two, since, as shown in Solan (2000), a naive definition of his discounted 
evaluation does not prove equilibrium existence when there are a multitude 
of non- absorbing states. 

We assume that positive e and e have been fixed. 

5.1 The function £ 

Let b be any move of Player Two at a state s G Af. 
For any (x, y) G X x Y define 






T,b&By S b9 b x ,y Note that 



(6) 
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with v 2 (b) := r 2 (s) if g b = g b = 0. 

Next we need to use large quantities Q\ > 1 and Q 2 > 1 that will be 
determined precisely later (in the next section) by the choice of a, e, e, e and 
e. Define L := Q 1 Q 2 and define K := L^L 

Define the function /i : [l,oo) — >■ by /i(r) = min{r, K}. Order the 

members {s u . . . , s m } of A/" with a X)J ,(si) < a XjJ/ (s2) < • • • a IiV (s m ). Define 

j=fc a x,y\ b j) 

For any move 6 at a state s <E Af define ^ to satisfy 
(l-^) = (f-^)(f-^). (7) 
If g£ y = 1, then g b y = 1 as well. Note that 

«?V(&) + (1 - g b )fr\s) = g b v 2 (b) + - g b )r 2 (s) = ~g b v 2 (b). (8) 

For every s e M and h E i-L denote iV s (/i) = #{n G N | s n = s} G NUoo. 
For 1 < % < N s (h) let nf(h) be the stage with the iih occurrence of the state 
s in h. If the initial state of h is s, then n\ = and N s (h) > 1. 

Define the discounted evaluation of a move 6 at a state s e Af according 

to 

&, = E b x y{h) [ e - -^r 1 n(i 

i=l UJ x,y\ t >) k=l 

c N s (h)-1 

JJ (i_^ S («) ], ( 9 ) 

where E b stands for the expectation over all infinite histories h G T-L with 
initial state s — s , assuming that Player Two plays the action b at stage 
0, the first stage, and afterwards follows y, whereas Player One follows x 
always. 

Lemma 5.1 The function obeys the properties 

^y = ~9l,yVU h ) + (1 - " ^JW*) ( 10 ) 

w x.v\ 1 
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and 



rl,y{s) = 1 + 



5(l-a x>y (s)) 

W Xj y(s)a x ,y(s) 



(11) 



where £ x>y (s) = J2beA s 2 Vb&y 
Proof: 

We now verify that £ satisfies (10) and (11). Separate the summation in 
(9) into three parts. 

• All histories such that N s (h) = 1. The probability of this event is g b , 
and the conditional expectation is v 2 {b). 

• All histories such that N s (h) > 1 and % — 1. The probability of this 
event is 1 — g b , and the conditional expectation is 7fV 2 (s). 

• All histories such that N s (h) > 1 and i > 1. The probability of this 
event is 1 — g b . Factor out one power of (1 — g 6 )(l — 4); the conditional 
expectation is (1 — (? 6 )(1 — 4)£(s). By (7) this part contributes (1 — 
S 6 )(l-f)£(s) to the sum. 

Putting together the three parts, with (8) connecting the first two parts, we 
get (10). For equation (11) we use (10) and take the expectation with respect 
to the moves. □ 

Notice that formula (11) is a slight variation of the standard relation- 
ship between discounted and undiscounted evaluations. £ will serve as the 
auxiliary discounted payoff evaluation of Player Two. Note that r^ y (s) > 
£x,y(s) Vs G M. Define £, xy {s) to be maximal value maxfe gj 4| ^ b xy . 

Lemma 5.2: For every s, t e N, 7 > 0, and (5, x, y) G (0, 1] x X x Y 

• a Xt y(t) < Ka Xt y(s) implies that w Xjy (t)a Xjy (t) < w x , y (s)a x , y (s), 

• w x ,y(s)a x , y (s) < w x ,y{t)a x , y {t) rl y (s) < r 2 xy (t) + 7 imply that 

€xM < i x ,y(t) + 7 + S. 

Proof: The first part follows directly from the definition of w. For the second 
part, note that for every r,w,a > and < 5 < 1 



rwa rwa rwa 2 5 



rwa 2 5 2 



r5 



< 



wa + 5(1 — a) wa + 5 (wa + S)(wa + 5 — 5a) 



w 2 a 2 



w 
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Moreover, T" 1 " is an increasing function in wa. Given w > 1, from the above 
we have that r 2 (s) and w(s)a(s) determine £(s) except for a quantity of no 
more than S. □ 



5.2 The Best Reply Correspondence 

For every state s G M define 

B 6,i( x ,v) = argmax aeAf wl >y (a) 

B' St2 (x,y) = argmax 6eA . ^ y if ^ >y (s) > £(s) 

B s S2 (x, y) = J«{s) U argmax 66A , £ y if ^ y {s) = j«{s). 

B s s , 2 (x,y) = J2(s) it CM < £(s). 

Player One maximizes her un-discounted payoff, while Player Two maximizes 
his auxiliary payoff, given that it is not too small. 

Let the corresondences B s s 1 and B s s 2 be those defined by the closure of the 
graphs of the correspondences Bf 1 and B$ 2 in (X xY)xA\ and (X xY)xA 2 , 
respectively. Define conv (B s gl ) and conv (B S S2 ) to be the correspondences 
with graphs in (X x Y) x X s and (X xY) xY s , respectively, such that z G 
conv (B s si (x,y)) if and only if {a G A{ \ z a > 0} is a subset of B s si {x,y) 
and z G conv (B s S2 {x,y)) if and only if {b G A s 2 \ z b > 0} is a subset of 
B s S2 (x, y). Define the correspondences B$ t \ from X x Y to X so that (x, y) in 
the domain corresponds to the sets Bg 1 (x, y) in the range, and likewise define 
the correspondence B$,2 from X x Y to Y. We define the correspondence 
F s : X xY X x Y by F s (x,y) = (B Stl (x,y),B St2 (x,y)). By Kakutani's 
fixed point theorem for every 5 > the correspondence Fs has a fixed point. 



5.3 Two Lemmas on Fixed Points 

We assume in the rest of the section that (x,y) is a fixed point for F s . We 
prove Lemmatta 5.4 and 5.5, described in the introduction. 

Remark 5.3: Since the jump correspondence is used before £ gets close 
to 0, any fixed point (x,y) of Fg is absorbing. This implies that rl <y (s) > 
j%(s) Vs G M . Indeed, suppose for the sake of contradiction that r^. y (s) < 
jx( s )- Denote by e the stopping time that is defined by the first stage in which 
the game leaves the set {u \ £ x ,y( u ) < Recall from Section 2 that j" is 
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a sub-martingale. Since for every absorbing state s G A £(s) = j a {s) = r 2 (s) 
we have j%(s) < Ej£(s e ) < E^ ;J/ (s e ) < Er 2 >w ( Se ) = r^ y (s), as desired. 

Lemma 5.4 If e < wa/4 then there is a choice for L* > 1 and 5* > 
such that if L > L* and < 5 < 5* and (x, y) is a fixed point of F s then 
> for all s G JV", 

2) if the jump correspondence is used at s then £ x ,y( s ) < r x j/( s ) — 3e 

3) for any action b from J°(s) used in y s g b < I , and 

4) the overall probability that Player Two plays an action from J"(s) at any 
s G M is at most wa/20. 

Proof: Let L* = and 5* = ea 3 w 3 /(300|./V|). Choose i to be a 

member of A/" with the largest difference j"(t) — £(t), and we must presume 
that this difference is non-negative. We will show that this difference can be 
no more than and that the frequency devoted to the jump correspondence 
at any such state can be no more than aw/20. 

We presume for the sake of contradiction that the frequency devoted to 
the jump correspondence at t is at least aw/20. Since r 2 > j% the expected 
value of the jump function j£ at the states reached on the next stage after 
t using the jump correspondence J" is at least au more than j%(t), we 
must assume for any move from J°(t) that there is at least one state u 
reached by this move with a probability of at least sucn that j x (u) > 
j x (t) + aw/2, necessarily with esc(u,t) < aou/4. (If esc(u,t) > aw/4 then 
a(t) > a 3 u 3 / (160\J\f\) and by (11) and the size of 5* we have made £(t) too 
close to r 2 {t) contradicting j x (t) < r 2 {t) — aw/2, - which must follow by 
Remark 5.3 since otherwise any move from the jump correspondence would 
be evaluated in an undiscounted way strictly above the level j x (t).) By the 
definition of w, the size of L* and (3) we have w(t)a(t) > w(u)a(u). By 
esc(w,t) < wa/4 it follows that |r 2 (t) — r 2 (u)\ < wa/4. But by Lemma 
5.2 we have £(t) > — 5 — au/A. With the size of 5* this contradicts 
jx( u ) — 3x(t) + auj /2 and the choice of t. 

Next, suppose for the sake of contradiction that J" is used at s and g b > I 
for some move b G J x (s). Indeed, g b > e implies that g b = 1. In particular, 
using Remark 5.3, Q y = w x , y (b) > £ t p(t]s; x,b)j°(t) > + wa. Thus, 

for every b' G B 2 (x, y) that maximizes £, £* y > ^ y > r 2 X)y (s) > j x (s) + wa/2. 
Since the overall probability to play actions from the jump correspondence 
is smaller than wa/20, this contradicts the assumption £(s) < j x (s). 
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Now we presume for the sake of contradiction that £(s) > r 2 (s) — 3e and 
the J" correspondence is used at s. Since we must assume that £(s) = j"(s), 
we have an increase in the value of r 2 of at least wa- 3e from a move in J*. By 
the dominance of ua over 4e, we must conclude that g b > e, a contradiction. 
□ 

Lemma 5.4 is the most problematic aspect of extending this proof to the 
case of finitely many positions. Any identification of infinitely many states 
as a single state may be meaningless if the states reached from it are not 
also identified. A more flexible definition of the discounted evaluation may 
be necessary. For example, at a state s one could discount future visits to 
other states t according to the difference between Player Two's undiscounted 
expected payoffs from these two states. 

The following lemma claims that if the auxiliary payoff is too far from 
the real payoff and the action causes absorption with small probability, then 
this probability is very small. This radical discontinuity is the key argument 
to our whole approach. 

Lemma 5.5 For L, a, I and 5 satisfying the conditions of Lemma 5.4 
and (x,y) a fixed point of F$ if £(s) < r 2 (s) — 2e and g b < I then g b < 
1.1 6£(s)/w(s) and g b < 2.3 ea(s) < 2.3 a(s). 

Proof: First we claim that f (s) - f (s) < ^fpj^(s). 
If the jump correspondence at s is used and b is such a move, since g b < I 
(from Lemma 5.4) it follows that g b = g b /e. Hence from (6) we have 

v 2 (b) = (1 - e)r 2 (s) + lv 2 (b) > r 2 (s) - e > £(s) + e. (12) 

Moreover, from (10) and (12) we have 

e > C(s) + ~g\v b - i{s)) - 5t(s)/w(s) > e(s)(l - 5/w(s)), 

and by Lemma 5.4, since £(s) is the average of ^(s) and such £ b , we have 
(1 - au)/20)(£(s) - < S -^§$, so the claim follows. 

Considering now any move b E that is used with g b < e and looking 
again at formula (10) we have £(s) > ^ 6 > ^ b (-u fe — ^(s)) + (1 — 8/w(s))£(s) 
and hence g b (v b — £,(s)) < 1.1 5^(s)/w(s), since by the above claim £(s) — 
^(s) is small compared to -~^y£( s )- First consider the consequence of v b — 
^(•s) > e, namely g b = eg b < 1.1 6£(s)/w(s). Second, consider ^ b < 
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1.1 6£(s) ^ 1.1 S£(s) i q. r 2 (s)-£(s)-I ^ -, in 

,~ b ,, \ -, >, < , 2i \ ti \ \ ? proven above. Since \ -r\ ^ > 1/2, we 
get # 6 < (^(^l)^^) - Now apply formula (11) for 2e < r 2 (s) - f(s) = 

£( g ) awaff - Since *( s ) - 1 and ^( s ) - 1 we have ^ - 5 ( s )) - 2 ^( s )' 
and from 5 < e/25 we have a(s) < 1/50. This allows us to conclude with 

4 = g b < f 2.2 a(s) < 2.3 a(s) < □ 



6 Second Main Theorem 

The goal of this section is to prove Theorem 2, which states that the condi- 
tions of Theorem 1 are always satisfied. First we need a simple but useful 
lemma. 

Lemma 6.1 For every two distinct non-absorbing states s, t with esc(t, s) < 

7 < 1 in an absorbing time homogeneous Markov chain P t (t , s) /x(s , t) does 
not differ from a(t) by more than a factor of 27, esc(t, s)//i(t, s) is within a 
factor of 37 to the ratio that, starting at t or s, the last visit before absorp- 
tion was at t rather than at s. Furthermore, with or without the assumption 
that the Markov chain is absorbing and with a start at either s or t, the ratio 
of the expected number of visits to s to those at t is at least 1 — 47 times 
the ratio of P\t, s) to P s (s, t). 

Proof: The first two claims follow directly from the formulas (1) and (2). 
The third claim follows from the first claim if the Markov chain is absorbing. 
Otherwise we recognize in 1/P*(t, s) the expected number of visits to t before 
reaching s. □ 

Remark 6.2 At a fixed point of F$ satisfying the properties of Lemmatta 
5.4 and 5.5, if < r 2 (s) — 2e and b is a move at s with g b > e then 
w 2 {b) = £(s), which is by Lemma 5.5 also within 5/20 of £(s). 

Theorem 2: For any choice of positive e, e, e, and e satisfying the 
inequalities stated in Theorem 1 all conditions of Theorem 1 are satisfied. 

Proof: Because it is sufficient to demonstrate the conclusion of Theorem 
1 with smaller choices for e, e and e, we will assume without loss of generality 
that a is small enough so that for every s e S c a (s) is within e/2 of the 
undiscounted zero-sum value c 2 (s), as described in Section 2, and e < auu/A. 

Define (3 := |ee^l/(3^l|^|). We require that L := Q 1 Q 2 is large enough 
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to satisfy the conditions of Lemma 5.4 and also that 

Qi > 80|A/"| 3 m7(p e 2 e 2 e 2 (5 2 ) and Q 2 > 80m\Af\/(p e e e). 

We begin with 5 sufficiently small, so that the condition of Lemma 5.4 
holds. Next, we consider fixed points of F$ corresponding to decreasing 8 > 
that have convergent subsequences for certain variables living in compact 
spaces - the stationary strategies in the space X x Y, the values u(a, b) for 
all pairs of moves at all states, the expected payoffs r 1 (s), r 2 (s), and the 
absorption rate a(s) for every s G Af, and the probabilities esc(t, s) for all 
pairs of states. 

We define a move a G A\ or b G ^ to be a Zzrm£ move if and only if the 
frequency of its use does not converge to zero as S goes to zero, and define q 
to be the minimal positive limit value for a frequency of a limit move chosen 
by either player. We define the quantity ft to be the minimal positive limit 
value for esc(s, t), v to be the minimal positive limit value for u(a,b), and a 
the minimal positive limit value for a(s). 

Next we must define the partition TZ of a subset P. Define a directed 
graph on the space M by t — > s if and only if in the limit esc(t, s) approaches 
zero. The relation is transitive, but not necessarily symmetric. It has an 
additional property, that if t — > si and t — > s 2 then either s± — > s 2 or 
s 2 — > s\. This is easy to confirm, because if s\ was not reached with proba- 
bility approaching one on the way from t to s 2 then it must be reached with 
probability approaching one after the state s 2 . Next define a relation ~ that 
is symmetric, transitive, and reflexive on a appropriate subset; s ~ t if and 
only if /i(s, t) approaches zero, and s ~ s if and only if a(s) approaches zero. 
~ defines a partition V of a subset P' of N '. Now we relate — > to ~. Define TZ 
to be the subset of V defined by A e TZ C "P if and only if w G A and -u — )> s 
implies that s G A. Any state s ^ A G 7?. such that esc(s, u) approaches zero 
for any (equivalently some) state u G A G TZ is called a satelite of A. Due to 
the above, a satelite of A G 7?. cannot be a satelite of any other member of 
TZ and every member of Q G V such that Q is not in TZ must be a satelite of 
the same A eTZ. We call an primitive exit (a, 6) from R G 7£ to be a satelite 
exit if with certainty the exit results in motion that doesn't leave i? or its 
collection of satelites. 

For every R G TZ we define the set of Player Two moves in R to be 
Br '■= {b G A s 2 | for some limit move aei] (a, b) is an primitive exit from 
R that is not a satelite exit}. 

If s is a satelite of R G 7£ then in the limit the probability that the last 
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visit to the pair s or any u G R was the state s must go to zero. Therefore 
v(a, b) approaches zero for any satelite exit (a, b) at u G R G 71. These 
facts follow directly from Lemma 6.1 and esc(s,u)/ p(u, s) going to zero in 
the limit. 

We show for every R G 7Z and pair s, t G R that the probability of using 
some exit in E Br (R) before reaching t from s also approaches zero. First 
this holds for any non-satelite primitive exit from R, because the probability 
of reaching a non-satelite outside of R would be at least p and therefore the 
probability of absorbing before reaching t must be in the limit at least the 
probability of using this exit times pp. The same arguments holds for the 
use of any move in Br, but with the quantity ppq instead of pp. 

More difficult is to show that the above holds for any satelite exit (a, b) at 
u G R. Let v be any satelite of R reached with positive probability from this 
exit. Let tt be the probability of using (a, b) before reaching t from a start at 
s and let 9 be a bound on the probability of not reaching any member of R 
from any other member of R or from a satelite of R. Let 7 be the probability 
of reaching v from s before reaching t, with 7 > Tip. Going through the state 
v, the probability of reaching t is at least 1 — 9 and the combined probability 
of reaching t from s is also at least 1 — 9. This means that the probability of 
reaching t from s conditioned on not going through v is at least 1 — -jjzfj ■ So 

conditioned on not arriving at v before t there is at most a probability 
of absorbing before getting back to s. In the limit cannot stay above 
1, because 9 goes to zero and in the limit 7 cannot go above 1 — p. This 
means that eventually the probability of reaching v from s must be at least 
7E£o(l - 7) l (l - t^Y = 7E£o(l - 20 - i) 1 = But this probability 

to reach v from s cannot go above 1 — fi in the limit, which is possible only 
if 7r goes to zero as 9 goes to zero also. 

Define e* to be {vpqa/Kf^V We require of a fixed point of F$ that the 
values for which we have convergent subsequences are within e* of their limit 
values. We require that the probability of using any exit before moving from 
any s to t for any pair s, t G R G 1Z is no more than e* and for every it! G 1Z 
that the sum of is(a, b) over all the satelite exits (a, b) G E Br (R) is no more 
than e* (as demonstrated above). Furthermore we require that 5 < (e*) 2 . We 
let (x, y) be a fixed point of F$ satisfying these properties. If the stationary 
strategy is not specified, then (x, y) is intended. 

Step 1; For every s G R G 1Z show that if zl^(s) > (3 then there 
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for elite R that ^(t)/z^(s) > (1 - ^ - W/4,f (*)) = 



Denote T = T d , where d G (1, L^" 1 ) satisfies T Ld \ T d = 0. Since K = L^, 
for every t G T we have a(t) < a(t)/e < p(s,t)/e < da(s)/e < Kd(s), and it 
follows that w(t)a(t) < w(s)a(s). With £(s) < r(s)— 2.5e we have by (11) that 
a(s) < 5/e, a(s) < 5/e, and p{s,t) < 5K/e < e* , meaning that T is a subset 
of R. Since i G T satisfies |r 2 (s) - r 2 (t)| < e* we have f (i) < ^(s) + (e* + 5). 
Define a quantity 



Define the stationary strategy x by removing from x all Player One moves at 
states t G T that are played with probability smaller than p l / p, and normalize 
the remaining vector. This means that if u is reached in one stage from t G T 
by x and a Player Two move b, then x, b) > p f . 

We use critically from Lemma 6.1 that a(u)//i(u,s) is approximately 
P u (u, s) (within a factor of 2e*) for any u G -R, so that from Lemma 3.2 
and Lemma 6.1 with the change from (x,y) to (x,y) the ratio of visits at 
t G R to those at s cannot increase by more than a factor of Fur- 



thermore, by the definition of x, 0, 5 < (e*) 2 and Lemma 5.5 there are no 
non-satelite exits performed inside of T other than those generated by Player 
Two moves b G A\ with w x , y (b) = £ x ,y(t)- Combined with the fact that the 
absorption rate of any move b with g b > 2.4e is altered by a factor or no more 
than m/(2peQ2) by the switch to x and that 2e* is greater than the probabil- 
ity that the last visit to R was at a satelite exit, we have everything but the 
claim that there is only insignificant motion with (x, y) toward absorption 
from states in R outside of the set T. 

We can break up the absorption from R generated by the strategies (x, y) 
in terms of where was the last visit in R. Let t G T, u G R\T and b G B 
be a move such that p(u\t,x,b) > 0, necessarily with p(u\t;x, b) > p l . To 



For every d > 1 define 



T d = {t G R | fi(s,t) < dd(s)} U {s}. 





complete the claim of Step 1 it suffices to show that 
such u G R\T. 




< ^ for every 
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Case 1; u G i?\T is reachable from T only by Player Two moves 
b with g b > I: 

It follows immediately from the fact that Player Two has no more than 
m\R\ moves in R that es ^f^ is smaller than 2.5/Qi, since any such move 
doesn't return to R with a probability of at least 2.5e and with at least 1 — 2e* 
probability there is motion from u back to t G T. 

Case 2; t ^ s, and w G i?\T is reachable by (x, y) from a i G T by 
a move 6 of Player Two with g b < e: 

By Lemma 5.5 we have 

p*esc(w, t) < g b < 2.3 a(t). 

Since p* = ^ we have esc(w,t) < 2.3 Q2^(s,t). Since n is a metric we 
have from fJ,(s,u) > L/i(s,t) 

esc(M) < 2.3 Q 2 fJ.(s,t) 2.3 Q 2 2^4 
/i(u,t) ~ l^(u,t) ~ L — l ~ Qi 

Case 3; u G i?\T is reachable by (x, y) from s by a move 6 of 
Player Two with g b < e: 

We have p s = I/Q2, p s esc(-u, s) < g b < 2.3a(s) and 

esc(w, s) 2.3 a(s)Q2 2.3 a(s)Q 2 2.3 
/i(-u, s) ~~ s) ~~ La(s) ~~ Qi 

In all arguments that follow concerning members of a set T as created 
above, for convenience we will write z e or B e instead of z 1 or _B 7 for 7 > e. 
By Lemma 5.5 there will be no difference in these expressions. 

Step 2; For any choice of s G R from Step 1 there is an e simpli- 
cation y of Player Two's strategy y such that together with x the 
state s and all states t G T with z^ y {t) > e e ^| y (s)/(4|A/'|) are reached 
by (x,y) from all of R, and furthermore from inside of T no state 
outside of T is reached: 

We define y* for alH G T by removing from y* all moves made by Player 
Two with a frequency of L/(L — l)Qi or less, followed by normalization. 

e e zl (s) 

Any t E T that satisfies z^ y (t) > — — by Step 1 also satisfies 
P§, v (s,t) > jM\Px, y (s,t) and z\ y (t) > {M. Notice that this last condition 
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is satisfied by the state t — s. For any t G T with P§ y (s : t) > j-§Jj\ Px, y { s i t) 

and z^ y (t) > j^Jfi to show that t is reached from all of T with (x,y) by 
Lemma 3.4 it suffices to show that for any w G T and any t E T satisfying 
z e Xy y(t) > 4 , including s — t, we have that the change from y w to 
doesnot reduce P™ y (w,t) by more than a factor of ee/3/(12|A/"| 2 ). 

If 6 G ^2 is a Player Two move with g b > e, removing 6 to form y" 1 from 
cannot reduce P™ y (w,t) by anything more than a factor of e*/e. Assuming 
that g b < e and removing 6 to make y 1 " removes at least j^r^p of the 
motion P™ y (w, t) we would have from Lemma 5.5 that 2.3 a(w)L/Qi(L — l) > 

9 b x ,yL/(L - l)Qi > g b xy y h > 12 e m ^ |2 4. 5 |^| a ( w )- This is a contradiction to the 
definition of Q±. 

Second, we show that, starting at s, motion according to (x,y) never 
leaves the set T. Let us assume that u is a state not in T reached by a move 
b of Player Two from any t G T played against x and given positive frequency 
by y. We need to show that b is not used in y. lit^s then by formula (3) 
pi(t,u) < ^£ = ^ s ^® 2 . In particular, by the definition of T and since // is a 
metric, 

n(s,t)Q 2 fJ,(s,t)Q 2 fi(s,u) < Q 2 _ L 
^ b — fi(t,u) ~ n(s,u) fi(t,u) ~ L — 1 (L — l)Qi 

And if 6 is a move at the state s then also by the definition of T and (3) 

a(s)Q 2 Q2Q(g) J_ 
~~ fi(s,u) ~ La(s) ~ Qi 

Therefore we conclude that (x, y) defines one ergodic set D C T that includes 
s and all states u E R satisfying z^ y {u) > 

Step 3; Show that there is a proper choice of s from Steps 1 
and 2 with a subset Vr of Player Two moves satisfying the condi- 
tions of Theorem 1, namely that these moves belong to a subset F 
containing s and inside of the ergodic set D such that w(t)a(t) is a 
constant for all t G F and there is a distribution on Vr such that 
used against x gives an expected payoff to Player One within e of 
r 1 (s): 

Define U := {t E R \ f (t) < r 2 (t) - 2.4e} and define U := {t G R | f (t) < 
r 2 (t) - 2.5e} n {f G i? | > /?}. We create a partition {U u ...,U k } of 
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the members of U in increasing values of wa, meaning that s and t belong to 
the same member of U if and only if w(s)a(s) = w(t)a(t). For any state s in 
U we consider the sets T(s) and D(s) and the strategies x(s),y(s) 6lxY 
as created above in Step 1 and Step 2. 

For the sake of contradiction we suppose that there is no s G U D U and 
b G Bl y (t) with t G D(s) n U such that |vI (s) j/ (6) - r 1 ^)] < e and there is 
no pair of Player Two moves b, b' G B xy (R) with both 6 and 6' belonging to 
the set D(s) n C/j with i'|( s ) jJ/ (6) and f|( s ) i j / (&') on different sides of r 1 (s). 

For every s <E U and t £ [/ H -D(s) with some move in B xy {t) used in 
y* let ^(f) = E 66 Bs itf (t) v ks),y( b )4(s), y / £&eBl iS (t) 4(s), y , the average Player 
One payoff resulting from these moves at t. For every 1 < i < k let : = 

We claim that our above assumption implies that z% y (s) < 3 p W^/(e) p W 
for every s E U H U . 

We prove the above claim by induction on i. Let s be any member of 

Ui fl £/, and we assume that |fg( s ) — r 5( s ),j/( s ) I — \ v l( s ) ~~ r l, y ( s )\ ^ From 
Part 1 and Part 2 we know that the importance with respect to (x(s),y) from 
exits outside of B e x y (D(s)) does not exceed e e/3 times the importance of the 
Bl y (s) moves. Since all the v](t) with t e D(s) fl C/j are on the same side of 
r x (s) as fg(s), we are left only with the B e xy moves from Li k< iU k fl D(s) to 
counter-ballance the vl(s) to make rL s y y = r xy . We can assume now that 
i > 1, since otherwise we would have to conclude that — (s)| > e 

is impossible. By the induction hypothesis the sum of all the z xy (u) over 
the set U k<i U k does not exceed Ek<i\U k \3 p ^(3/e p ^ < \&>® p / . By 
the fact that our simplications x(s) hardly influence the expected payoffs 
from moves with an absorption rate of at least 2e and by the statement 
of Step 1, in order to maintain \vl(s) — r xy (s)\ > e we must assume that 
^ z x, y ( s ) — ||3 p ^ / 5/e p ^ -1 \ and this concludes the proof of our claim. 

With the definition of (3 we conclude that z x ' y e (s) < e/\R\ for every s G 
R, and this means that R could not have been chosen for polarization, a 
contradiction. 

With the appropriate s G R chosen, we have D R := D(s), Xq defined 
from x(s) and yc defined from y(s) so that changes are made only inside of 
Dr, and the exits Vr and their distribution as determined by yo come from 
the above argument. 
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Step 4; show that the moves B R satisfy the requirements of 
Theorem 1: 

The easiest way to prove that \r Xj y R {sR) — r Xt y(sn)\ < e is to return part 
of the way back to the space <Sj! We let <Sjj be the space generated by the 
almost trivial partition V := {R} U {{s} | s G' R}. With f 1 the harmonic 
function on <Sj induced by r 1 on the absorbing states, by Lemma 3.5 ^(sr) 
and r 1 (s j fj) differ by at most 4e*. Let D$ be the corresponding measure of the 
importance of the exits. 

Define a move a G A\ of Player One in the set R to be a principle move if 
a is not a limit move and if there is a b G -Bf such that (a, 6) is an exit with 
v x ,y{a,b) > v - e*. 

We claim that the combination (a, 6) of a move of -Br with a principle 
move of Player One must yield u(a, b) < e*. Once this is established from the 
definition of Br we need only to break down the sum of the v r (a, b)v$(a, b) 
over all exits (a, b) with v(a, b) > v — e* and apply Lemma 3.7 to conclude 
that r xyR (s R ) is within 20\M\me* / v of r xy (s R ), that is much closer than we 
need it. Suppose for the sake of contradiction that for some principle a and 
some b G Br that u(a, b) > v — e*. Assuming that the moves take place at t, 
we have from the definition of B R that a(t) > y b (q — e*)(fi — e*)p. Furthermore 
by definition we have u(a,b) < x a yb/a(t) and by assumption x a < e*. These 
four inequalities are contradictory. 

We show that b G B R n A\ with t G P implies v 2 {b) < r 2 {t). If £(t) < 
r 2 (t) — 2e then it follows from Lemma 5.5. If > r 2 (t) — 2e then by Lemma 
5.4 all moves have the same auxiliary value £(£) = £(£); it follows from the 
smallness of 5 < (e*) 2 and formulas (10) and (11) that if v 2 (b) > r 2 (t) then 
the repeated use of b would result in a higher evalution for because an 
undiscounted value of at least r 2 {t) would be obtained but at much higher 
auxillery absorbing rate. 

Step 5; show z 2 - 5e (s) < i for any state s that is not in P or is a 
satelite of some R G 1Z: 

If s is not a satelite and not in P then due to the very small size of 5 we 
have from (11) that £(s) is within e of r 2 (s), implying that no move b used 
at s could satisfy w 2 (b) < r 2 (s) — 2e. For a satelite s of R we suppose that 
b G A s 2 is a Player Two move at s with g b > 2.5e. Such moves have at least 
a 2.4e probability of never returning to the set R. The probability of using 
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such a move before reaching R must be no more than e*/(2.4e), and thus the 
total probability that it is used cannot exceed e*/(2.4e(/t — e*)). q.e.d. 

7 Signaling 

In this section we show that there are approximate equilibria without an 
assumption that Player One can send signals independent of the transitions. 
The problem concerns the consequences to the players of any moves that 
would be used by Player One as a transition dependent signal. For example, 
a move of Player One that brings the play outside of the set D R may fail to 
be useful to signal her desire for Player Two to use a move in V R) because 
outside of D R the jump function for Player Two may exceed greatly his 
expected payoff from the moves in Vr. 

The natural solution is for Player One to have a move inside of D R that 
is not used in xc whose use means that the moves Vr of Player Two will not 
be used, and after a certain quantity of visits to some state in D R it will be 
understood mutually that Player Two must use a move in Vr. A problem 
arises, however, if every such move results in a positive probability of leaving 
the set R. 

With regard to the next two theorems, we assume the statement and 
proof of Theorem 2, which means also that we assume that all the conditions 
of Theorem 1 are satisfied. We will add new conditions to those of Theorem 
1 and make some minor changes to the proof of Theorem 1. The definition of 
<Sjj remains, along with its Markov chain transitions, including the p* R and pr. 
The changes begin with the definition of the parts q R and q R and therefore 
everything that follows in the proof of Theorem 1 will be altered as well, 
including the introduction of new situations. 

Theorem 1': Assume the following property for every R e TZ: if every 
move o e A| in D R removed to make xc from x formed an exit against 
some Player Two move used in yc, then there exists a set A R of Player One 
principle moves in D R such that 

1) the sum of v x ^ y (a,b) for all R exits (a, b) performed outside of D R does 
not exceed eee (3/3, 

2) for every principle move a G A R of Player One used at t G D R with 
K, y > P e e e/(3|jV|m) we have £ b used in yt v x>y (a, b) > (1 - e e ~e $)v a xy 
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and therefore also y (a) — v], y (t)\ < e. 

Conclusion: Without any assumption on Player One's ability to signal 
indendependently of the transitions, the game has approximate equilibria. 

Proof: Define a member of 7Z to be problematic if the assumption of 
Theorem V holds. We proceed exactly as the proof of Theorem 1, except 
that for all problematic R we incorporate into the S$ transition q c R all the 
R exits not inside of Dr or not created from a combination of an a G Ar 
with a move used in yc- Recalling that q R is the difference between q° R and 
p* R by Lemma 3.7 we still have that v$(q R )(v^(q R ) — r^(s R )) is below e. Due 
to Condition 2 and Lemma 3.7 we have the other requirement for applying 
Lemma 3.9. We assume that T is the subset of 7Z that has been polarized. 

Define a situation s w at a state s to be timed if there is a natural number m 
such that s w is determined by the present state s and the previous situations 
and moves in the last m stages. A normal situation is timed, but the converse 
doesn't hold. 

We keep the same situations s e , and s 9 from the proof of Theorem 
1. The stationary strategies for all the s 9 and all the s e other than a rep- 
resentative s e R are defined in the same way, and in a non-problematic R the 
stationary strategies for are also the same. 

For every polarized R e T and t G D R we create a timed situation t h . 
When a situation s e R is reached the strategies (xc,yc) are performed, but 
instead of moving to a t* or t 9 there is motion to the timed situation t h . 

For non-problematic polarized R G T we choose any t G Dr such that 
there is a Player One move a at t not used in xc and when paired with yc 
results in zero probability of leaving the set R. Create a frequency f a >0 and 
a number n t such that f a Z^o^l ~ fa) 1 — 1 — ^r, where is that quantity 
determined by the polarization, and such that for any distinct u, v G Dr the 
probability of using the move a before moving from u to v is at least 1 — e*. 
For all the situations s h ioi s ^ t the players act according to (xc, yc) and at 
t Player Two according to y c and Player One according to (1 — f a )xc + faXa- 
If on the n t th visit to the situation t h the move a was not made, then the 
situation following t h is some u 9 . Otherwise if the move a was used on any 
visit to the situation t h then the next situation is either some u* if an exit 
wasn't used or some u e if an exit was used. 

For problematic R G 1Z, let ir R G A(A R ) be the probability distribution on 
the A R that is generated conditionally by (x,yc)- Choose a natural number 
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riR and a stationary strategy x* c for Player One so that with a start at sr 
the distribution on the moves Ar is ttr and for every pair u,v G Dr the 
probability of using a move in A fi before moving from u to t> is no more than 
e* and the probability of using some member of Ar at or before the n/jth 
visit to the state sr is 1 — Xr. For the situations t h with t G Dr the players 
act according to (x c ,yc)- If on the n^th visit to the situation s R the move 
a was not made, then the situation following t h is some u g . Otherwise if a 
move in Ar was used on any visit to a situation t h then the next situation is 
either (if an exit was not used) or u e (if an exit was used). At a situation 
vJ the strategies (x c , yc) are also used. 

As with the proof of Theorem 1 we must show that the expected payoffs 
to Player i from every situation s e is within 3.1e of r l x y {s). Given the proof 
of Theorem 1 the only additional argument needed concerns the use of exits 
in a problematic R before the timed situations have been reached. This did 
not present a problem in the proof of Theorem 1 because they were the same 
exits used in the situations t* and performed with the same distributions. 
If we can show that the total probability of their occurance cannot exceed 
e/10, then we get our result by ignoring their influence. Indeed in the Markov 
chain defined on S$ the absorption rate of sr for a problematic R is at least 
pfi/(2Qi). By Lemma 3.9 this absorption rate does not fall below ^ (^jWi 
after polarization. Since this quantity is still very large compared to e*, 
the maximal probability of using such a exit before a timed situation is 
reached, we can indeed ignore these exits. (We leave the formal argument 
using Section 3 to the reader.) 

The situations defined above are not normal and thus do not generate a 
stochastic game, preventing a direct application of Corollaries 4.3 and 4.4. 
Therefore we must perceive the situations {s h | s G R} for R G T as sub- 
games. Concerning the behavior of Player One, we view the entire process up 
until the n^th visit to the state sr or the n t th visit to t as a single decision 
- whether or not to use a move in Ar and if so then which one. This places 
Player One's decisions back into the context of Corollary 4.4. 

Concerning the behavior of Player Two, the matter is more complex. 
Player Two could have an influence on the payoffs by altering the strategy 
yc- Strictly speaking the context would be no longer that of a harmonic 
function on a time homogeneous Markov chain - the expected payoff to 
Player Two at a state corresponding to a situation t h would be changing 
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over time. However Player Two's ability to gain or lose in expected payoff is 
conditioned on the use of a move of Player One in A R - this is modeled by 
a time homogeneous Markov chain and therefore Proposition 4.2 is sufficient 
for the conclusion. □ 

Theorem 2' The conditions of Theorem 1' are satisfied always. 

Proof: Let (x,y) G X x Y and (x c ,yc) be a solution given by Parts 1, 
2, and 3 of Theorem 2 for a polarized R G 1Z and we assume that conditions 
of Theorem 1' hold for R (meaning that R is problematic). 

1) Consider the strategies played at any t G Dr. Suppose for the sake of 
contradiction that there is a state u G R\D R where an importance of at least 
e e e/3|i?| occurs from exits at u. Consider the moves that were removed 
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from y l to make y t \ By Lemma 5.5 at any t G D R no more than (l~i)q- l 
of the transition P*(t, u) was removed to make y l c from y l . On the other 
hand, given that every move of Player One removed from x l to make x l c 
would have created an exit against some move in y l C) we must also conclude 
from the rare use of an exit that no more than 2e*Q\ of the transition in 
P^ y (t, u) came from such a Player One move. From Lemma 3.3 we have that 
u is in D R , a contradiction. 

2) Assume that t/% > (3eee/ (3\Af\m) for some principle move a of Player 
One at t G D R . Suppose for the sake of contradiction that the probability of 
reaching any absorbing state from this principle move is altered by a factor 
of more than f3eee/2 by the change from y to yc- This means that v XtV (a, b) 

—2 ~2 ^2 q1 

is at least 6 6 ^l m 2 for at least one move b that was removed to make y l c 

from y l . We must conclude from Lemma 5.5 that - < Qj(l-i) > a 

contradiction to the definition of Q±. 

The final claim follows now from the argument in part 4 of the proof of 
Theorem 2, showing that v\. {a) is very close to the value of r 1 for all primary 
moves. □ 

In the proof of Theorem 1' we could eliminate the argument that exits per- 
formed before reaching a timed situation in a problematic set are irrelevant 
if we had a more powerful Markov chain result (that combines the condition 
of Lemma 3.3 with the conclusion of Lemma 3.2) or we use Vieille's approach 
to "communication sets" (Vieille 2000c), showing how one can move through 
a set R with no danger of leaving it. 
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8 Countably many states 



On the technical side, the problem with applying either our or Vieille's proof 
to countably many states lies in the finite state space assumption that given 
any stationary strategies for the players and any positive 5 there will exist a 
S perturbation of this strategy that is absorbing. 

A strategy for finding a counter-example could be following. Construct 
an infinite sequence of games T , r 1; . . . that are positive recursive for both 
players corresponding to increasing finite sets So C Si C ... of non-absorbing 
states such that for every % > and j > % the moves and their induced 
motions inside of Sj are the same for all games Tj. Construct a countable 
state space by having the game start at so, define the state space on the 
ith stage to be the space Sj, and declare that absorption occurs on stage 
i if an absorbing state of the game Tj is reached. Furthermore, give both 
players the ability to force the game to absorption in the new countable 
state space game. Desirable may be games Tj such that with large i the 
approximate equilibrium behavior of keeps the non-absorbing play most 
of the time close to the set S and the minimal number of stages necessary 
to reach an absorbing state in the game Tj starting from any s S goes 
to infinity as i goes to infinity. Otherwise if we allow that absorbing states 
are reachable quickly from all non-absorbing states, to avoid convergence 
toward large sub-games of essentially equilivalent states it may be desirable 
if reaching an absorbing state of Tj on the ith stage of play does not mean 
certain absorption but rather a positive probability of absorption mixed with 
a positive probability of starting the game over at s e r . 

There are many ways for a game to have a countable state space but be 
played essentially on finitely many situations, for example games that break 
down into sequences of sub-games played essentially on finite state spaces. 
Also to be avoided are structures that are formally countable in size but 
do not exploit the full potential of what it means to have infinitely many 
positional possibilities. We believe that the best candidates for a counter- 
example will incorporate the concept of a random walk on arbitrarily many 
positions, as presented in our introduction. However, to avoid operator ap- 
proaches similar to that of the Maitra and Sudderth proof we believe that 
there must be a conflict by both players between exploiting their positions 
and controlling the behavior of the other player. For this and other reasons, 
we believe that the non-absorbing states must have a structure more complex 



50 



than Z, for example involving joint random walks on the two dimensional 
lattice Z 2 . 

Additional Acknowledgment: The author thanks Cafe Europe on 
Azzahra Street in Jerusalem for the many hours he worked on the proof at 
the cafe. 
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